Azure Table Storage: Retrieve all data by the same, specific partition key - azure

I'm trying to retrieve a list of orderitems with the same orderID (orderID as the partition keys) from the table storage. For example: U001, and under one orderID there will be multiple productsID acting as rowkeys linked under the aforementioned U001 orderID.
The problem is so far with my knowledge you only can retrieve a table storage's by specifically mentioning BOTH the partition key and rowkey. Is there anyway to get all the data in the table storage by the only specifying the partition key?

The problem is so far with my knowledge you only can retrieve a table
storage's by specifically mentioning BOTH the partition key and
rowkey.
Not true. Considering a partition key/row key combination uniquely identifies an entity, if you want to fetch a single entity then you would specify both partition key and row key to get that entity.
Is there anyway to get all the data in the table storage by the only
specifying the partition key?
Yes. You would need to query entities in your table for that. You query (filter criteria) would be PartitionKey eq 'your-partition-key'. That way you will be able to fetch entities matching your partition key.
Please see this link for more details: https://learn.microsoft.com/en-us/rest/api/storageservices/query-entities.

Related

Is it necessary to use all the columns defined as the primary key to query a Cassandra database?

I am using Cassandra database and need to define the Primary Key which is a combination of partition key and clustering keys. The cassandra database needs to be queried based on the combination of two fields i.e. a customer number and createdAt (Unix timestamp value), as per the business requirement. These columns cannot be used as Primary key because they cannot uniquely identify a row in the database. So, is it correct to add the uuid column from database as a clustering key to make the primary key unique, so that the Primary key will become a combination of - customerNumber(Partition key), createdAt (ClusteringKey), uuid( clustering key). However the database will never be queried based on the whole primary key. It will always be queried based on the part of the Primary key i.e. Customer Number and createdAt. uuid will never be used to query the database.
So if I understand correctly, your PRIMARY KEY definition looks like this:
PRIMARY KEY (customerNumber,createdAt,uuid)
It will always be queried based on the part of the Primary key
Yes, querying by part of the PRIMARY KEY definition is fine, in your case. Cassandra tries to restrict queries to a single node, and it achieves this by ensuring that an entire partition is written to a single node (and then replicated). Because of this, you really only need to supply the partition key on your queries (customerNumber), and they should work.
Supplying an additional PRIMARY KEY component however, is helpful. In a high-throughput scenario, the smaller you can keep your result set payloads, the better.
tl;dr;
Querying by customerNumber and createdAt will be just fine.

Filter on the partition and the clustering key with an additional criteria

I want to filter on a table that has a partition and a clustering key with another criteria on a regular column. I got the following warning.
InvalidQueryException: Cannot execute this query as it might involve
data filtering and thus may have unpredictable performance. If you
want to execute this query despite the performance unpredictability,
use ALLOW FILTERING
I understand the problem if the partition and the clustering key are not used. In my case, is it a relevant error or can I ignore it?
Here is an example of the table and query.
CREATE TABLE mytable(
name text,
id uuid,
deleted boolean
PRIMARY KEY((name),id)
)
SELECT id FROM mytable WHERE name='myname' AND id='myid' AND deleted=false;
In Cassandra you can't filter data with non-primary key column unless you create index in it.
Cassandra 3.0 or up it is allowed to filter data with non primary key but in unpredictable performance
Cassandra 3.0 or up, If you provide all the primary key (as your given query) then you can use the query with ALLOW FILTERING, ignoring the warning
Otherwise filter from the client side or remove the field deleted and create another table :
Instead of updating the field to deleted true move your data to another table let's say mytable_deleted
CREATE TABLE mytable_deleted (
name text,
id uuid
PRIMARY KEY (name, id)
);
Now if you only have the non deleted data on mytable and deleted data on mytable_deleted table
or
Create index on it :
The column deleted is a low cardinality column. So remember
A query on an indexed column in a large cluster typically requires collating responses from multiple data partitions. The query response slows down as more machines are added to the cluster. You can avoid a performance hit when looking for a row in a large partition by narrowing the search.
Read More : When not to use an index

Is It Possible To BatchGet Multiple Items By Partition Key Only DynamoDB

I have items with ItemID's and Paths. ItemID is the partition key and Path is the range key. If I have multiple ItemID's I want to query, but don't want to include the range key is it possible to do it with batchGet or will I have to use query for each of the ItemID's? I have tried batchGet but get the error "The provided key element does not match the schema"
No, it is not possible to get the items based on Partition key only. The batch get item API requires both Partition and Range key.
Keys - An array of primary key attribute values that define specific
items in the table. For each primary key, you must provide all of the
key attributes. For example, with a simple primary key, you only need
to provide the partition key value. For a composite key, you must
provide both the partition key value and the sort key value.
However, you can use Query API to get the data by partition key only.

Azure Table Storage - Retrieving all entities matching a partial row key

I am just learning Azure Table Storage and I'm able to save and retrieve entities without any problem. However, I'd like to do the following. Say I have row keys (all with the same partition key) that look as follows:
KJV-C1-V1
KJV-C1-V2
KJV-C1-V3
KJV-C2-V1
KJV-C2-V2
KJV-C2-V3
I'd like to be able to perform two types of queries in .NET C#:
Retrieve all entities with row keys that start with 'KJV-C1'.
Retrieve all entities with row keys that contain '-C1-' in the key
Preferrably I'd like to be able to do this without reading all entities in the partition and filtering the ones that don't match the pattern I'm looking for. Is this possible with Azure Table Storage queries?
Retrieve all entities with row keys that start with 'KJV-C1'.
This is possible. Sample OData query:
PartitionKey eq 'your partition key' and (RowKey ge 'KJV-C1' and RowKey lt 'KJV-C2')
Retrieve all entities with row keys that contain '-C1-' in the key
This unfortunately is not possible. You would have to fetch all entities and filter the data on the client side.
You cannot do something like contains() over keys. But as it supports CompareTo("") method, you need to slightly modify your table design.
Maintain multiple partition keys instead of single. You can simple push 'KJV' part of your row key to partition key. Then start with C1-V1, C1-V2 as your row keys.
Then, if you want
All entries of KJV - Query for partition key 'KJV'
All 'C1' entries of KJV - Query for partition key 'KJV' and row key starting with 'C1-'
All entries for C1 - Query for row key starting with 'C1-'
OR with out design change in table, you need to loop through your major products like 'KJV' and build multiple queries with each starting with 'KJV-C1-', then union all of them to get final result.
Please mind that table storage does not allow all LINQ operations and sometimes you need to design the table keys keeping your majority of queries in mind.

How to chose Azure Table ParitionKey and RowKey for table that already has a unique attribute

My entity is a key value pair. 90% of the time i'll be retrieving the entity based on key but 10% of the time I'll also do a reverse lookup i.e. I'll search by value and get the key.
The key and value both are guaranteed to be unique and hence their combination is also guaranteed to be unique.
Is it correct to use Key as PartitionKey and Value as RowKey?
I believe this will also ensure that my data is perfectly load balanced between servers since ParitionKey is unique.
Are there any problems in the above decision?
Under any circumstance is it practical to have a hard coded partition key? I.e all rows have same partition key? and keeping the RowKey unique?
Is it doable, yes, but depending on the size of your data, I'm not so sure it's a good idea. When you query on partition key, Table Store can go directly to the exact partition and retrieve all your records. If you query on Rowkey alone, Table store has to check if the row exists in every partition of the table. so if you have 1000 key value pairs, searching by your key will read a single partition/row. If your search via your value alone, it will read all 1000 partitions!
I face a similar problem, I solved it 2 ways:
Have 2 different tables, one with partitionKey as your-key, the other with your-value as partitionKey. Storage is cheap, so duplicating data shouldn't cost much.
(What I finally did) If you're effectively returning single entites based on a unique key, just stick them in blobs(partitioned and pivoted as in point 1), because you don't need to traverse a table, so don't.

Resources