Cassandra ColumnFamily Counter Limitation - cassandra

I'm trying to figure out the best schema for working with both counters and non-counting values. All these values are supposed to be in the same spot and I was going to work with wide columns but because Cassandra doesn't support a mixture of those types, that won't work.
Would I have to create a separate column family, one to hold the counters, and the other to hold other data types?

Yes you are absolutely correct in your understanding.
Always maintain separate column family for maintaining the counter. Also since in counter column familiy's new feature to have some normal column as a part of compound key gives us an added advantage.

Counter data type can't be used as a primary key.
All non-row key fields have to have counter data type.

Related

Cassandra - Same partition key in different tables - when it is right?

I modeled my Cassandra in a way that i have couple of tables with the same partition key - Uuid.
Each table has it's partition key and others column representing data for specific query i would like to ask.
For example - 1 table have Uuid and column regarding it's status (no other clustering keys in this table) and table 2 will contain the same Uuid (Also without clustering keys) but with different columns representing the data for this Uuid.
Is it the right modeling? Is it wrong to duplicate the same partition key around tables in order to group each table to hold relevant column for specific use case? or it preferred to use only 1 table and query them and taking the relevant data for the specific use case in the code?
There's nothing wrong with this modeling. Whether it is better, or worse, than the obvious alternative of having just one table with both pieces of data, depends on your workload:
For example, if you commonly need to read both status and data columns of the same uuid, then these reads will be more efficient if both things are in the same table, which only needs to be looked up once. If you always read just one but not both, then reads will be more efficient from separate tables. Also, if this workload is not read-mostly but rather write-mostly, then writing to just one table instead of two will be more efficient.

Counter table in cassandra

Whats the point of having no non-key column in cassandra counter table?
I have a table with some key and non key column but I cannot keep a counter column....although I want the rows to be sorted based on some counter(hits).
If I create a separate table for counter, how do I relate two table for sorting?
thanks in advance
Counters are a very different type of cell in cassandra internals. Everything about them is different than most other cassandra types. They require special care and it just isn't worth the complexity to be able to mix them in with other cells.
You can use the same primary key structure in two tables, one with counters and one with other cells/columns. You just can't have the other cells/columns in the counter table.

Replacing integer column in Cassandra table

In a table the cluster key is an int column which is a system generated number - chrg Issue is
Since its defined as int datatype it can store values only uptil 2billion.
And since the data of the table is huge..by next two months load we will hit the max value that can be stored in the column beyond which loads will fail.
Hence the requirement is to change the datatype of the column to something like longint with least impact.
How can this be achieved with a minimal downtime?
You Cannot change the type of primary key.
So one of the approach I can think of is:
Create a separate table with modified datatype.
Modify your application to write data to both the tables.
Then you can use spark & cassandra to read data from older table and write it to new table.
Then again in your application you can stop writing to old table.
With above approach I don't think you will have major impact.

how to implement fixed number of (timeuuid) columns in cassandra (with CQL)?

Here is an example use case:
You need to store last N (let's say 1000 as fixed bucket size) user actions with all details in timeuuid based columns.
Normally, each users' actions are already in "UserAction" column family where user id as row key, and actions in timeuuid columns. You may also have "AllActions" column family which stores all actions with same timeuuid as column name and user id as column value. It's basically a relationship column family but unfortunately without any details of user actions. Querying with this column family is expensive I guess, because of random partioner. On the other hand, if you store all details in "AllActions" CF then cassandra can't handle that big row properly at one point. This is why I want to store last N user actions with all details in fixed number of timeuuid based columns.
Maybe you may have a better design solution for this use case... I like to hear that ...
If not, the question is how to implement fixed number of (timeuuid) columns in cassandra (with CQL) effectively?
After insertion we could delete old (overflow) columns if we had some sort of range support in cql's DELETE. AFAIK there is no support for this.
So, any idea? Thanks in advance...
IMHO, this is something that C* must handle itself like compaction. It's not a good idea to handle this on client side.
Maybe, we need some configuration (storage) options for column families to make them suitable for "most recent data".

how to define dynamic columns in a column family in Cassandra?

We don't want to fix the columns definition when creating a column family, as we might have to insert new columns into the column family. Is it possible to achieve it? I am wondering whether it is possible to not to define the column metadata when creating a column family, but to specify the column when client updates data, for example:
CREATE COLUMN FAMILY products WITH default_validation_class= UTF8Type AND key_validation_class=UTF8Type AND comparator=UTF8Type;
set products['1001']['brand']= ‘Sony’;
Thanks,
Fan
Yes... it is possible to achieve this, without even taking any special effort. Per the DataStax documentation of the Cassandra data model (a good read, by the way, along with the CQL spec):
The Cassandra data model is a schema-optional, column-oriented data model. This means that, unlike a relational database, you do not need to model all of the columns required by your application up front, as each row is not required to have the same set of columns. Columns and their metadata can be added by your application as they are needed without incurring downtime to your application.

Resources