Partition key with UDT field in CQL - cassandra

I have a little question concerning the partition key in Cassandra.
When I create a table which contain a field called flxB whose type is an UDT like this :
CREATE TYPE fluxes (
flux float,
flux_prec smallint,
flux_error float,
flux_error_prec smallint,
flux_bibcode text,
system text
);
Can I put the field flxB.flux in my partition key ?

No, you can't put flxB.flux on any part of primary key
Even In cassandra version lower than 3.0 UDT type field must be defined as frozen
When using the frozen keyword, you cannot update parts of a user-defined type value. The entire value must be overwritten. Cassandra treats the value of a frozen, user-defined type like a blob.
In Cassandra all the part of the primary key must be present when inserting/updating, If cassandra would allow you to put flx.flux in partition key, How cassandra will make sure all the part of the primary key is present in the insert/update query ?

Related

CassandraQL allow filtering

I am creating a table in cassandra database but I am getting an allow filtering error:
CREATE TABLE device_check (
id int,
checked_at bigint,
is_power boolean,
is_locked boolean,
PRIMARY KEY ((device_id), checked_at)
);
When I make a query
SELECT * FROM device_check where checked_at > 1234432543
But it is giving an allow filtering error. I tried removing brackets from device_id but it gives the same error. Even when I tried setting only the checked_at as the primary key it still wont work with the > operator. With the = operator it works.
PRIMARY KEY in Cassandra contains two type of keys
Partition key
Clustering Key
It is expressed as 'PRIMARY KEY((Partition Key), Clustering keys)`
Cassandra is a distributed database where data can be present on any of the node depending on the partition key. So to search data fast Cassandra asks users to send a partition key to identify the node where the data resides and query that node. So if you don't give parition key in your query then Cassandra complains that you are not querying the right way. Cassandra has to search all the nodes if you dont give it partition key. Thus Cassandra gives a error ALLOW FILTERING if you want to query without partition key.
With respect to > not supported for partition key, answer remains same as when you give a range search in your query then Cassandra has to scan all the nodes for responding which is not the right way to use Cassandra.

GeoMesa: Cassandra table with composite key

Is it possible to create a Cassandra table with GeoMesa specifying keys (ie - a composite key)? I have a spark job that writes to Cassandra and a composite key is necessary for the output table. I would now like to create/write that same table somehow through the GeoMesa api instead of directly to Cassandra. The format is like this:
CREATE TABLE IF NOT EXISTS mykeyspace.testcompkey (pkey1 text, ckey1 int, attr1 int, attr2 int, minlat decimal, minlong decimal, maxlat decimal, maxlong decimal, updatetime text, PRIMARY KEY((pkey1), ckey1) )
Is this possible? You can see also in the create table statement that I have a partition key and a clustering key. From what I have read, I believe Geoserver does support both Simple and Complex features. I am just wondering if that support also maps over into the realm of Cassandra with GeoMesa?
Thank you
GeoMesa does use composite partition and clustering keys for Cassandra tables, but the keys are not configurable by the user - they are designed to facilitate spatial/temporal/attribute CQL queries.
Keys can be seen in the index table implementations here. The columns field (for example here) defines the primary keys. Columns with partition = true are used for partitioning, the rest are used for clustering.

Cassandra: Does timeuuid preserve order?

I was using timestamp as primary key for my data by calling toTimestamp(now()), but unfortunately this creates collision.
I understand that timeuuid guarantees uniqueness, but if I do ORDER BY timeuuid, does timeuuid also guarantee the original order?
From the docs:
Timeuuid types can be entered as integers for CQL input. A value of the timeuuid type is a Version 1 UUID. A Version 1 UUID includes the time of its generation and are sorted by timestamp, making them ideal for use in applications requiring conflict-free timestamps. For example, you can use this type to identify a column (such as a blog entry) by its timestamp and allow multiple clients to write to the same partition key simultaneously. Collisions that would potentially overwrite data that was not intended to be overwritten cannot occur.
http://docs.datastax.com/en/cql/3.3/cql/cql_reference/uuid_type_r.html
http://docs.datastax.com/en/cql/3.3/cql/cql_reference/timeuuid_functions_r.html

An Approach to Cassandra Data Model

Please note that I am first time using NoSQL and pretty much every concept is new in this NoSQL world, being from RDBMS for long time!!
In one of my heavy used applications, I want to use NoSQL for some part of the data and move out from MySQL where transactions/Relational model doesn't make sense. What I would get is, CAP [Availability and Partition Tolerance].
The present data model is simple as this
ID (integer) | ENTITY_ID (integer) | ENTITY_TYPE (String) | ENTITY_DATA (Text) | CREATED_ON (Date) | VERSION (interger)|
We can safely assume that this part of application is similar to Logging of the Activity!
I would like to move this to NoSQL as per my requirements and separate from Performance Oriented MySQL DB.
Cassandra says, everything in it is simple Map<Key,Value> type! Thinking in terms of Map level,
I can use ENTITY_ID|ENTITY_TYPE|ENTITY_APP as key and store the rest of the data in values!
After reading through User Defined Types in Cassandra, can I use UserDefinedType as value which essentially leverage as One Key and multiple values! Otherwise, Use it as normal column level without UserDefinedType! One idea is to use the same model for different applications across systems where it would be simple logging/activity data can be pushed to the same, since the key varies from application to application and within application each entity will be unique!
No application/business function to access this data without Key, or in simple terms no requirement to get data randomly!
References: http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1/
Let me explain the cassandra data model a bit (or at least, a part of it). You create tables like so:
create table event(
id uuid,
timestamp timeuuid,
some_column text,
some_column2 list<text>,
some_column3 map<text, text>,
some_column4 map<text, text>,
primary key (id, timestamp .... );
Note the primary key. There's multiple columns specified. The first column is the partition key. All "rows" in a partition are stored together. Inside a partition, data is ordered by the second, then third, then fourth... keys in the primary key. These are called clustering keys. To query, you almost always hit a partition (by specifying equality in the where clause). Any further filters in your query are then done on the selected partition. If you don't specify a partition key, you make a cluster wide query, which may be slow or most likely, time out. After hitting the partition, you can filter with matches on subsequent keys in order, with a range query on the last clustering key specified in your query. Anyway, that's all about querying.
In terms of structure, you have a few column types. Some primitives like text, int, etc., but also three collections - sets, lists and maps. Yes, maps. UDTs are typically more useful when used in collections. e.g. A Person may have a map of addresses: map. You would typically store info in columns if you needed to query on it, or index on it, or you know each row will have those columns. You're also free to use a map column which would let you store "arbitrary" key-value data; which is what it seems you're looking to do.
One thing to watch out for... your primary key is unique per records. If you do another insert with the same pk, you won't get an error, it'll simply overwrite the existing data. Everything in cassandra is an upsert. And you won't be able to change the value of any column that's in the primary key for any row.
You mentioned querying is not a factor. However, if you do find yourself needing to do aggregations, you should check out Apache Spark, which works very well with Cassandra (and also supports relational data sources....so you should be able to aggregate data across mysql and cassandra for analytics).
Lastly, if your data is time series log data, cassandra is a very very good choice.

Cassandra 1.2 : Updating type in primary Key CQL3

We currently have a table defined as below
create table tableA(id int,
seqno int,
data text,
PRIMARY KEY((id), seqno))
WITH CLUSTERING ORDER BY (seqno DESC);
We need to update the type for the id column from int to text. We are wondering out of the two approaches, would be the most advisable.
ALTER TABLE tableA ALTER id TYPE varchar; (the command succeeds but then we have issues reading the data. Is this because the ALTER table doesn't update the underlying storage of the id column?)
COPY to/from oldtable/newtable. This works but we have issues with the RPC timeout (which we can change), but is this a bad idea on a table across a cluster?
We have checked the online docs and these are only 2 options we can find around this. are there other options??
Thanks
Paul
I would say option 1 isn't really supported. If your integers don't map to actual strings you're going to have problem, you're probably seeing key validation errors.
for option 2 you probably just need to copy smaller chunks of data for each read/write.

Resources