Cassandra Row-level access control on non-partition key - cassandra

Is there any way of getting Row level access control on non-partition key from Cassandra?
For instance, RLAC (Row-level access control) is possible on partition key as seen in the below link:
https://docs.datastax.com/en/security/6.7/security/secRlac.html
Is there any suggestion to this problem?

These is no way to do that on Cassandra side, the column must be a partition key, maybe you can find solution using ACLs on application side.
I hope this help !

Related

How to change UniformInt64 partition count and partition low/high key without redeploying the service dynamically?

Hi I have a stateless service partitioned using UniformInt64 kind, Is there a way to change the partition count, high/low key on the fly without re-deploying the service? I see, with powershell command we can change the Instance count but I didnt find a way to update partition count and low/high key using same.
You can't change partitions on the fly. To remove or add partitions, would require all stored data in all partitions to be re-partitioned. There's no support for this in SF.
To deal with this, you can introduce an intermediate service to act as sort of a 'librarian' when you need to fetch or store data.
Here's a video that explains more about partitioning and the librarian service.
More docs about partitioning here and a really good blog post here.

Partition key for Azure Cosmos DB collection

I am bit new to Azure Cosmos DB and trying to understand the concepts.
I want help to decide the the best possible partition key for DocumentDB collection. Please refer image below which have possible partitions using different partition keys.
As mentioned in the blog post here,
An ideal partition key is one that appears frequently as a filter in
your queries and has sufficient cardinality to ensure your solution is
scalable.
From above line, I think, in my case, UserId can be used as partition key.
Can someone please suggest me which key is the best possible candidate for partition key?
From the 10 things to know about DocumentDB Partitioned Collections and micro official document , you could find lots of very good advice about choice of partitioning key, so I'm not going to repeat here.
The selection of partitioning keys depends on the data stored in the database and the frequent query filtering criteria.
It is often advised to partition on something like userid which is good if you have. Suppose your business logic has many queries for a given userid and want to look up no more than a few hundred entries. In such cases the data can be quickly extracted from a single partition without the overhead of having to collate data across partitions.
However, if you have millions of records for the user then partitioning on userid is perhaps the worst option as extracting large volumes of data from a single partition will soon exceed the overhead of collation. In such cases you want to distribute user data as evenly as possible over all partitions. You may need to find another column to be the partition key.
So , if the data volume is very large, I suggest that you do some simple tests based on your business logic and choose the best partitioning key for your performance. After all, the partitioning key cannot be changed once it is set up.
Hope it helps you.
It depends, but here are few things to consider:
The blog post you mentioned say:
Additionally, the storage size for documents belonging to the same partition key is limited to 10GB. An ideal partition key is one that appears frequently as a filter in your queries and has sufficient cardinality to ensure your solution is scalable.
Also, I really recommend to check this post and video, https://learn.microsoft.com/en-us/azure/cosmos-db/partition-data,
The choice of the partition key is an important decision that you have to make at design time. You must pick a property name that has a wide range of values and has even access patterns.
So make sure to choose a partition Key that has many values and meets those requirements.

Cassandra: Sort by query

I have a little bit special request.
Constelation: I use a Redis DB to store geo data and use georedius to get them back, sorted by distance. With this keys I search the data in cassandra. But the result of cassandra is sorted in the key or something else.
What I want is, to get the inforamtions back in the same order i requested it.
The partition key is build from id (I get back form redis) and a status.
Could I tell cassandra to sort by id array?
Partition key are designed to be randomly distributed across different nodes. You can use ByteOrderedPartitioner to do ordered queries. But BOP are considered anti-pattern is cassandra and I will highly recommend against it. You can read more about it here Cassandra ByteOrderedPartitioner.
You can add more parameters to the Primary Key which will determine how to store data on the disk. These are known as clustering keys. You can do Order By queries on clustering keys. This is a good document on clustering keys https://docs.datastax.com/en/cql/3.1/cql/ddl/ddl_compound_keys_c.html.
If you can share more schema details, I can suggest what to use as clustering key.

Cassandra: Data Type for Partition Key - Decimal or UUID

I want to describe the problem I am working on first:
Currently I try to find a strategy that would allow me to migrate data from an existing PostgreSQL database into a Cassandra cluster. The primary key in the PostgreSQL is a decimal value with 25 digits. When I migrate the data, it would be nice if I could keep the value of the current primary key in one way or another and use it to uniquely identify the data in Cassandra. This key should be used as the partition key in Cassandra (no other columns are involved in the table I am talking about). After doing some research, I found out that a good practise is to use UUIDs in Cassandra. So now I have two possible solutions to solve my problem:
I can either create a transformation rule, that would transfer my current decimal primary keys from the PostgrSQL database into UUIDs for Cassandra. Everytime someone requests to access some of the old data, I would have to reapply the transformation rule to the key and use the UUID to search for the data in Cassandra. The transformation would happen in an application server, that manages all communication with Cassandra (so no client will talk to Cassandra directly) New data added to Cassandra would of course be stored with an UUID.
The other solution, which I already have implemented in Java at the moment, is to use a decimal value as the partition key in Cassandra. Since it is possible, that multiple application servers will talk to Cassandra concurrently, my current approach is to generate a UUID in my application and transform it into a decimal value. Using this approach, I could simply reuse all the existing primary keys form PostgreSQL.
I cannot simply create new keys for the existing data, since other applications have stored their own references to the old primary key values and will therefore try to request data with those keys.
Now here is my question: Both approaches seem to work and end up with unique keys to identify my data. The distribution of data across all node should also be fine. But I wonder, if there is any benefit in using a UUID over a decimal value as partition key or visa versa. I don't know exactly what Cassandra does to determine the hash value of the partition key and therefore cannot determine if any data type is to be preferred. I am using the Murmur3Partitioner for Cassandra if that is relevant.
Does anyone have any experience with this issue?
Thanks in advance for answers.
There are two benefits of UUID's that I know of.
First, they can be generated independently with little chance of collisions. This is very useful in distributed systems since you often have multiple clients wanting to insert data with unique keys. In RDBMS we had the luxury of auto-incrementing fields to give uniqueness since that could easily be done atomically, but in a distributed database we don't have efficient global atomic locks to do that.
The second advantage is that UUID's are fairly efficient in terms of storage, and only require eight bytes.
As long as your old decimal values are unique, you should be able to use them as partition keys.

making cassandra store data on a local node

What is a simple way of configuring a cassandra cluster so that if I try to store a key in it, it will be stored in the local node to which I issue the set/write command?
I am looking at the IPartitioner which allows me to specify how the key will be hashed but it seems a bit heavy weight for something like above.
Thanks!
If you were able to arbitrarily write keys to arbitrary nodes, then on lookup the system would not know where the data for that key lived. The system would have to do a full cluster lookup which would be super slow.
By design, Cassandra spreads the data around in a known way so that lookups are quick.
Check out this post by Jonathan Ellis the primary maintainer of Cassandra.

Resources