Composite key as the partition key for azure table storage - azure

I have a data model which as properties say A,B,C,D..G. This model has a composite key (A,B,C,D). I need to store entities of this data model into azure storage.
Should I concatenate (A+B+C+D) and then then store the result as value of partition key (for faster retrieval operations?).
What is the best practice to choose partition key/row key in such cases?

Should I concatenate (A+B+C+D) and then then store the result as value of partition key (for faster retrieval operations?)
As this official document mentioned about considering queries:
Knowing the queries that you will be using will allow you to determine which properties are important to consider for the PartitionKey. The properties that are used in the queries are candidates for the PartitionKey.
If the entity has more than two key properties, you could use a composite key of concatenated values.
What is the best practice to choose partition key/row key in such cases?
For a better querying performance, you need to consider the properties that used in your queries as candidates for the PartitionKey or RowKey. Here is a simple sample for you to have a better understanding of choosing the PK/RK:
There is a table called Product which has the following properties:
| ID | Name | CategoryID | SubCategoryID | DeliveryType | Price | Status | SalesRegion |
If the query is frequently based on CategoryID and SubCategoryID, we could combine CategoryID_SubCategoryID as the PartitionKey to quickly locate the specific partition and retrieve all the products within the specific Category. For the RowKey, we could just set ID for querying on the specific product ID or SalesRegion_Price_DeliveryType for filtering the products in the order of SalesRegion,Price,DeliveryType.
Additionally, you could follow this tutorial about designing scalable and performant Azure Storage Table.

Check this out:
https://learn.microsoft.com/en-us/rest/api/storageservices/fileservices/designing-a-scalable-partitioning-strategy-for-azure-table-storage
Looks like a good starting point.

Related

Should I set foreign keys in Cassandra tables?

I am new to Cassandra and coming from relational background. I learned Cassandra does not support JOINs hence no concept of foreign keys. Suppose I have two tables:
Users
id
name
Cities
id
name
In RDBMS world I should pass city_id into users table. Since there is no concept of joins and you are allowed to duplicate data, is it still work passing city_id into users table while I can create a table users_by_cities?
The main Cassandra concept is that you design tables based off of your queries (as writes to the table have no restrictions). The design is based off of the query filters. An application that queries a table by some ID is somewhat unnatural as the CITY_ID could be any value and typically is unknown (unless you ran a prior query to get it). Something more natural may be CITY_NAME. Anyway, assuming there are no indexes on the table (which are mere tables themselves), there are rules in Cassandra regarding the filters you provide and the table design, mainly that, at a minimum, one of the filters MUST be the partition key. The partition key helps direct cassandra to the correct node for the data (which is how the reads are optimized). If none of your filters are the partition key, you'll get an error (unless you use ALLOW FILTERING, which is a no-no). The other filters, if there are any, must be the clustering columns (you can't have a filter that is neither the partition key nor the clustering columns - again, unless you use ALLOW FILTERING).
These restrictions, coming from the RDBMS world, are unnatural and hard to adjust to, and because of them, you may have to duplicate data into very similar structures (maybe the only difference is the partition keys and clustering columns). For the most part, it is up to the application to manipulate each structure when changes occur, and the application must know which table to query based off of the filters provided. All of these are considered painful coming from a relational world (where you can do whatever you want to one structure). These "constraints" need to be weighed against the reasons why you chose Cassandra for your storage engine.
Hope this helps.
-Jim

Search for more than one element in a list in Cassandra

I'm learning how the data model works in Cassandra, what things you can do and what not, etc.
I've seen you can have collections and I'm wondering if you can search for the elements inside the collection. I've seen that you can look for one element with contains, but if you want to look for more than one you need to add more filters, is there any way to do this better? is it a bad practice?.
This my table definition:
CREATE TABLE data (
group_id int,
user timeuuid,
friends LIST<VARCHAR>,
PRIMARY KEY (group_id, user)
);
And this what I know i can use to look for more than one item in the list:
SELECT * FROM groups where friends contains 'bob' and friends contains 'Pete' ALLOW FILTERING;
Thank you
Secondary indexes are generally not recommended for performance reasons.
Generally, in Cassandra, Query based modelling should be followed.
So,
That would mean another table:
CREATE TABLE friend_group_relation (
friend VARCHAR,
group_id int,
<user if needed>
PRIMARY KEY ((friend), group_id)
);
Now you can use either IN query (not recommended) or async queries (strongly recommended, very fast response) on this table.
You can follow 2 different approaches
Pure cassandra: use a secondary index on your collection type as defined here documentation
You may also be able to use Solr and create a query against solr to retrieve your entries. Although this may look like a more complicated solution because it will require to use an extra tool it will avoid using secondary indexes on Cassandra. Secondary indexes on Cassandra are really expensive and based on on your schema definition may impact your performances.

cassandra: can you query against a collection field?

cassandra: can you query against a collection field?
say if you wanted to keep a friends list in such a field, can you run a query along the lines of: where user_id = xxx and friend = 'bob'?
If a collection is not right for this, What is the proper way to keep track of friends in cassandra?
Secondary indexes are still not yet supported but development is in progress (CASSANDRA-4511)
In your model, if you know the user_id you could fetch the user and check if 'bob' is in their friends list at the application side. If you need to query on whether person_A is friends with person_B you can extract the collection to its own table but this model will require doing 2 queries.
CREATE TABLE friends (
user_id text,
friend text,
PRIMARY KEY(user_id, friend)
);
CREATE INDEX idx_friends on friends (friend);
Say you have a result from the above table like:
user_id | friend
---------+--------
daniel | bob
daniel | jack
jack | bob
If you want to find all the people following bob, you can use SELECT * FROM friends WHERE friend='bob'. You could actually do it too without having the secondary index and using ALLOW FILTERING but this can lead to unpredictable performance:
The ALLOW FILTERING option allows to explicitly allow (some) queries that require filtering. Please note that a query using ALLOW FILTERING may thus have unpredictable performance (for the definition above), i.e. even a query that selects a handful of records may exhibit performance that depends on the total amount of data stored in the cluster.
Docs for ALLOW FILTERING.

Collection of embedded objects using Cassandra CQL

I am trying to put my domain model into Cassandra using CQL. Let's say I have USER_FAVOURITES table. Each favourites has ID as a PRIMARY KEY. I want to store the list of up to 10 records of multiple fields, field_name, field_location and so on in order.
Is this a good idea to model a table like this
CREATE TABLE USER_FAVOURITES (
fav_id text PRIMARY KEY,
field_name_list list<text>,
field_location_list list<text>
);
And object is going to be constructed from list items of matching indicies (e.g.
Object(field_name_list[3],field_location_list[3]))
I query favourites always together. I may want to add and item to some position, start, end or middle.
Is this a good practice? Doesn't look like, but I am just not sure how to group objects in this case, also when i want to keep them in order by, for example, field_location or more complex ordering rule
I'd suggest the following structure:
CREATE TABLE USER_FAVOURITES (
fav_id text PRIMARY KEY,
favs map<int, blob>
);
This would allow you to get access to any item via index. The value part of map is blob, as one can easily serialize a whole needed object into binary and deserialize later.
My personal suggestion will be don't emphasize too much on cassandra collection, as it is bloom further in future. Though above specified scenario is very much possible and no harm in doing so.

SQL or Oracle Table structure in Redis

I am using node and planning to use redis to store data [data will be in the SQL or oracle table format with many fields like ID, name, City, Marks, etc].
Found that we can store only key and value in redis with three data structures [in list, set or sorted set].
Is it possible for me to store like Table name [Key name] : Details
and values like ID : 1, Name : john, Country:Russia,
ID : 2, Name : Rose , Country:US , etc.
Is there any other data structure apart from list, set and sorted set in redis?
Yes. See the docs.
http://redis.io/topics/data-types
You also have the Hash data structure...
Database tables are used to store Entities. A loose definition of an Entity is something that has a unique primary key. In Redis, Entities are usually stored using Hash data structure, where columns in the database become fields in the Hash. The primary key is stored in the key of the hash.
Database tables also store non-entities, such as relationships between Entities. For example one-to-many relationship is typically done using a foreign key. In Redis, such relationships can be modeled in Sets, Lists or SortedSets.

Resources