Dynamic Column Family in Datastax Cassandra - cassandra

As there is a way to create dynamic column family in Cassandra through CQL 3 i.e using a composite primary key with COMPACT STORAGE.
For inserting data in dynamic column family (wide rows), which will be efficient way, datastax java driver or Thrift API’s.
As I'm using Datastax, and Datastax highly suggest using non-compact tables for new developments, inspite non-compact table are less "compact" internally, then how should I need to create dynamic column family, with COMPACT STORAGE or without COMPACT STORAGE.
Please suggest.

To answer your other question. You should stay away from the Thrift API for new development. If you use the Trift API you will miss out on all the great features of CQL 3 including:
Compound Keys
Collections & other types
Usability
Use the latest Datastax Java driver!
(2.1)
http://www.datastax.com/documentation/developer/java-driver/2.1/java-driver/whatsNew2.html

To do similar things to what you previously did with a "wide row" you will want to use a compound primary key that includes clustering columns.
CREATE TABLE data (
sensor_id int,
collected_at timestamp,
volts float,
PRIMARY KEY (sensor_id, collected_at)
);
See the following blog posts for more information on using wide partitions in CQL3 and how old thrift terminology relates to what you can do in CQL3:
http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows
http://www.datastax.com/dev/blog/cql3-for-cassandra-experts
http://www.datastax.com/dev/blog/thrift-to-cql3

Related

GeoMesa: Cassandra table with composite key

Is it possible to create a Cassandra table with GeoMesa specifying keys (ie - a composite key)? I have a spark job that writes to Cassandra and a composite key is necessary for the output table. I would now like to create/write that same table somehow through the GeoMesa api instead of directly to Cassandra. The format is like this:
CREATE TABLE IF NOT EXISTS mykeyspace.testcompkey (pkey1 text, ckey1 int, attr1 int, attr2 int, minlat decimal, minlong decimal, maxlat decimal, maxlong decimal, updatetime text, PRIMARY KEY((pkey1), ckey1) )
Is this possible? You can see also in the create table statement that I have a partition key and a clustering key. From what I have read, I believe Geoserver does support both Simple and Complex features. I am just wondering if that support also maps over into the realm of Cassandra with GeoMesa?
Thank you
GeoMesa does use composite partition and clustering keys for Cassandra tables, but the keys are not configurable by the user - they are designed to facilitate spatial/temporal/attribute CQL queries.
Keys can be seen in the index table implementations here. The columns field (for example here) defines the primary keys. Columns with partition = true are used for partitioning, the rest are used for clustering.

How to understandr primary key in Apache cassandra?

i new for use apache cassandra, i have install cassandra and use cqlsh in my laptop
i used to create table using :
create table userpageview( created_at timestamp, hit int, userid int, variantid int, primary key (created_at, hit, userid, variantid) );
and insert several data into table, but when i tried to select using condition for all column (i mean one by one) it's error
maybe my data modelling wrong, maybe anyone can tell me how create data modelling in cassandra
thx
You need to read about partition keys and clustering keys. Cassandra works much differently than relational databases and the types of queries you can do are much more restricted.
Some information to get you started: here and here.

Issue with NoSql data model

As being newbie, facing issues with the data modelling on the Cassandra data model. We are planning to use the Cassandra for the reporting purpose. In the reporting we need to filter data by multiple parameters. Let's say We have a column family
Create table cf_data
(
Date varchar,
Attribute1 varchar,
Attribute2 varchar,
Attribute3 varchar,
Attribute4 varchar,
Attribute5 varchar,
Attribute6 varchar,
Primary Key(Date)
)
We need to support query like
Select * from cf_date where date = '2015-02-02' and Attribute1 in ('asdf','assf','asdf') and Attribute1 in ('wewer','werwe') and Attribute2 in ('sdfsd','werwe') and Attribute3 in ('weryewu','ghjghjh')
I know we need to respect the primary key restrictions while querying the column family. Cassandra internal storage works like
SortedMap<String,SortedMap<Key,Value>>
NoSQL works on the principle of storing denormalized data as per the access pattern. If I need to satisfy the above query how should I model the column family. From report UI, user can select the values from Attribute1, Attribute2, Attribute3.... etc as a drop down. One option could be using Spark on top of the Cassandra node to support SQL queries but it's better the model the column family as Cassandra expects.
Any pointers ??
From the Datastax CQL documentation:
"Under most conditions, using IN in the WHERE clause is not recommended. Using IN can degrade performance because usually many nodes must be queried."
If you need to use Spark to support SQL queries, you may be better off using a proper SQL database. Just because NoSQL is a fad, you don't need to follow it. Not all data can be efficiently modeled in all NoSQL DBs.
One other inefficient option for you is to query without the attributes itself and code the filtering in the application, at the risk of creating a large latency in response. If the reports are not to be created in real time or near real-time, then you should be good.

Which is write efficient "create table with option With compact storage" or "create table with option With clustering order storage"?

I am designing schema for a read as as well Write critical Problem statement.
Which will be more write and read efficient Create table with compact storage or create table with the Clustering order.
As per my requirement Clustering order helps me to safe some time during reading. but at the same time i fear that it could effect the insertion.
can any one tell ?
Compact storage is for backwards compatibility with thrift apps..I'd recommend avoiding it. From the official docs:
Using compact storage¶
The compact storage directive is used for backward compatibility of
old applications with CQL. Use the directive to store data in the
legacy (Thrift) storage engine format. To take advantage of CQL
capabilities, do not use this directive in new applications.
CREATE TABLE sblocks ( block_id uuid, subblock_id uuid, data
blob, PRIMARY KEY (block_id, subblock_id) ) WITH COMPACT STORAGE;
Using the compact storage directive prevents you from defining more
than one column that is not part of a compound primary key. A compact
table using a primary key that is not compound can have multiple
columns that are not part of the primary key.
A compact table that uses a compound primary key must define at least
one clustering column. Columns cannot be added nor removed after
creation of a compact table. Unless you specify WITH COMPACT STORAGE,
CQL creates a table with non-compact storage.¶
A table with a Clustering order really has no penalty over a table without. Writes always go into the memtable (since Cassandra uses a log structured storage) and is more or less like a line of log. Clustering keys really help while reading to seek to the right CQL row inside a partition. Searching using a Clustering key is very efficient and is really the recommended way to do things.
I don't have the rep for a comment so thought I'd leave this here for anyone who stumbles upon this question and is using C* >= 3.0.
Cassandra's storage engine was re-factored in version 3. Data is now stored more compactly on disk by default. There is no benefit in using the COMPACT STORAGE option, besides backwards thrift compatibility, in fact it should be avoided altogether.
DataStax Reference

Hector support for CQL3 specific features (Partition & Clustering keys) and Compact Storage option

I'm trying to leverage a specific feature of Apache Cassandra CQL3, which is partition and clustering keys for tables which are created with compact storage option.
For Eg.
CREATE TABLE EMPLOYEE(id uuid, name text, field text, value text, primary key(id, name , field )) with compact storage;
I've created the table via CQL3 and i;m able to insert rows successfully using the Hector API.
But I couldn't find right set of options in the hector api to create the table itself as i require.
To elaborate a little bit more:
In ColumnFamilyDefinition.java i couldnt see an option for setting storage option (as compact storage) and In ColumnDefinition.java, i couldnt find the option to say that this column is part of the Partition and Clustering Keys
Could you please give me an idea of whether i can use Hector for this (i.e. Creating table) or not and if i can do that, what are the options that i need to provide?
If you are not tied to Hector, you could look into the DataStax Java Driver which was created to use CQL3 and Cassandra's binary protocol.

Resources