I'm trying to make simple insert in cassandra with Hector client.
I've created simple table:
create table customs (id uuid PRIMARY KEY, char_col_1 varchar);
And try to insert something:
UUID columnId = UUID.randomUUID();
HColumn<String, String> column =
HFactory.createColumn(
"char_col_1",
"test",
12345,
StringSerializer.get(),
StringSerializer.get()
);
Mutator<UUID> mutator = HFactory.createMutator(keyspace, UUIDSerializer.get());
mutator.insert(columnId, "customs", column);
But I always get error:
InvalidRequestException(why:Not enough bytes to read value of component 0)
I think this is because Hector is unable to insert data (through thrift) to a table with non-compact storage. I'm not a hector expert though.
If you create your column family as follows, it might work.
create table customs (id uuid PRIMARY KEY, char_col_1 varchar) with compact storage;
But, why do you need to create a column family in CQL and insert data through hector. That is not a recommended way. You should choose either CQL or thrift.
In your case if you are planning to use hector for all data operations, you can create column family using hector itself or cassandra-cli.
Or if you want CQL only to create column families you can use datastax java driver instead of hector.
Related
I am new to cassandra and I read some articles about static and dynamic column family.
It is mentioned ,From Cassandra 3 table and column family are same.
I created key space, some tables and inserted data into that table.
CREATE TABLE subscribers(
id uuid,
email text,
first_name text,
last_name text,
PRIMARY KEY(id,email)
);
INSERT INTO subscribers(id,email,first_name,last_name)
VALUES(now(),'Test#123.com','Test1','User1');
INSERT INTO subscribers(id,email,first_name,last_name)
VALUES(now(),'Test2#222.com','Test2','User2');
INSERT INTO subscribers(id,email,first_name,last_name)
VALUES(now(),'Test3#333.com','Test3','User3');
It all seems to work fine.
But what I need is to create a dynamic column family with only data types and no predefined columns.
With insert query I can have different arguments and the table should be inserted.
In articles, it is mentioned ,for dynamic column family, there is no need to create a schema(predefined columns).
I am not sure if this is possible in cassandra or my understanding is wrong.
Let me know if this is possible or not?
if possible Kindly provide with some examples.
Thanks in advance.
I think that articles that you're referring where written in the first years of Cassandra, when it was based on the Thrift protocols. Cassandra Query Language was introduced many years ago, and now it's the way to work with Cassandra - Thrift is deprecated in Cassandra 3.x, and fully removed in the 4.0 (not released yet).
If you really need to have fully dynamic stuff, then you can try to emulate this by using table with columns as maps from text to specific type, like this:
create table abc (
id int primary key,
imap map<text,int>,
tmap map<text,text>,
... more types
);
but you need to be careful - there are limitations and performance effects when using collections, especially if you want to store more then hundreds of elements.
another approach is to store data as individual rows:
create table xxxx (
id int,
col_name text,
ival int,
tval text,
... more types
primary key(id, col_name));
then you can insert individual values as separate columns:
insert into xxxx(id, col_name, ival) values (1, 'col1', 1);
insert into xxxx(id, col_name, tval) values (1, 'col2', 'text');
and select all columns as:
select * from xxxx where id = 1;
Let's say I have a column family in Cassandra that was created using cassandra-cli like this:
create column family users with key_validation_class = UTF8Type and comparator = UTF8Type;
In terms of the thrift to CQL3 migration guide from Datastax this is a dynamic column family.
When viewed from CQL3 client using DESCRIBE TABLE users it looks like this:
CREATE TABLE users (
key text,
column1 text,
value blob,
PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE
AND CLUSTERING ORDER BY (column1 ASC);
That is the expected behavior. What I want is to add column metadata so that the column family is viewed as static.
So I tried this using cassandra-cli:
update column family users
with column_metadata = [{column_name: email, validation_class: UTF8Type}];
However the end result in CQL3 is not what I wanted:
CREATE TABLE users (
key text,
column1 text,
value blob,
email text,
PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE
AND CLUSTERING ORDER BY (column1 ASC);
What I expected is the same result as when I create the column family with the metadata from the beginning:
create column family users2
with key_validation_class = UTF8Type
and comparator = UTF8Type
and column_metadata = [{column_name: email, validation_class: UTF8Type}];
In that case the CQL3 view of this is what I want:
CREATE TABLE users2 (
key text PRIMARY KEY,
email text
) WITH COMPACT STORAGE;
Is there some way how I can add column metadata to a column family that was created without any - so that it would be viewed from CQL3 the same way as if the metadata was provided when the column family was created? Without re-creating the column family, of course.
It's not possible to create static column using the old Thrift API. In fact, a static column is just a trick, e.g. a column with clustering value = NULL so there is only 1 instance of it for each partition key.
See those 2 slides for the explanation (sorry text in French):
http://www.slideshare.net/doanduyhai/cassandra-techniques-de-modlisation-avance/218
http://www.slideshare.net/doanduyhai/cassandra-techniques-de-modlisation-avance/219
You should take this opportunity to migrate to CQL. Thrift is deprecated and even disable by default starting with Cassandra 3.x
Ok I see what you mean. Look at the system keyspace, table schema_columnfamilies.
I think the label of the partition keys and clustering columns are stored there.
It maybe possible to change them but I don't know if it's a good idea to hack into those meta tables directly.
If you have n nodes, you'll probably need to update the label on all those nodes since the system keyspace has a LocalStrategy.
Execute this query to see the actual labels:
SELECT key_aliases,key_validator,column_aliases,comparator
FROM system.schema_columnfamilies
WHERE keyspace_name='xxx'
AND columnfamily_name='users';
I am new to cassandra, I am using cassandra datastax driver to access my keyspace. I have a legacy table which is created by using cassandra thrift client. I am in need of retrieving two column values from each partion in one query. It is like multigetslice Query in hector api. How can I do this using cql and DataStax Java driver?
--edit--
My column family is a legacy table, which looks like the following in cqlsh.
CREATE TABLE messages (
key blob,
column1 text,
value blob,
PRIMARY KEY ((key), column1)
).
I need to select two values for each key. In this table i used to store messages of each user. userid as rowkey, messageid as columnname and message as value. I need to show two latest messages from each user.
Try using an IN filter condition.
I think you should execute one request per partition (execute concurrently if getting more than one parition). Assuming you want the top two in the natural order of 'column1':
SELECT column1, value FROM messages WHERE key=<blob> LIMIT 2;
I have a cassandra column family with a lot of dynamic columns. I am running a simple Spark-Cassandra connector example where I am trying to fetch all the data from this table. The issue is that it is not fetching any of the dynamic columns from my column family.
In my example and code snippet below, it is able to fetch the primary key and secondary index column for all the rows but not any of the other columns (It has 30+ more dynamic columns). I have a feeling the connector supports fetching of only partition and clustering keys as columns as of now, based on my reading here (Spark Datastax Java API Select statements). Could someone please confirm if my understanding is correct. It would be great if someone can suggest how to fix this ?
/**
* Loads a cassandra column family as a spark RDD.
*/
public static CassandraJavaRDD<CassandraRow> getCassandraTableRDD(
JavaSparkContext context, String keyspace, String table)
{
return javaFunctions(context).cassandraTable(keyspace, table);
}
CREATE TABLE source_product_canonical_data_sample (
'key' text PRIMARY KEY,
source text
) WITH
comment='' AND
comparator=text AND
read_repair_chance=0.000000 AND
gc_grace_seconds=864000 AND
default_validation=text AND
min_compaction_threshold=4 AND
max_compaction_threshold=32 AND
replicate_on_write='true' AND
compaction_strategy_class='SizeTieredCompactionStrategy' AND
compression_parameters:sstable_compression='LZ4Compressor';
Your CQL table definition is not aware of your "dynamic columns". There is no compound primary key with clustering columns in it. Dynamic columns / wide-rows are terms related to the old thrift data model, and in CQL they have been replaced with compound primary key.
See this excellent blog post by Jonathan Ellis explaining how to transition to the new data model: http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows
I've just had a crash course of Cassandra over the last week and went from Thrift API to CQL to grokking SuperColumns to learning I shouldn't use them and user Composite Keys instead.
I'm now trying out CQL3 and it would appear that I can no longer insert into columns that are not defined in the schema, or see those columns in a select *
Am I missing some option to enable this in CQL3 or does it expect me to define every column in the schema (defeating the purpose of wide, flexible rows, imho).
Yes, CQL3 does require columns to be declared before used.
But, you can do as many ALTERs as you want, no locking or performance hit is entailed.
That said, most of the places that you'd use "dynamic columns" in earlier C* versions are better served by a Map in C* 1.2.
I suggest you to explore composite columns with "WITH COMPACT STORAGE".
A "COMPACT STORAGE" column family allows you to practically only define key columns:
Example:
CREATE TABLE entities_cargo (
entity_id ascii,
item_id ascii,
qt ascii,
PRIMARY KEY (entity_id, item_id)
) WITH COMPACT STORAGE
Actually, when you insert different values from itemid, you dont add a row with entity_id,item_id and qt, but you add a column with name (item_id content) and value (qt content).
So:
insert into entities_cargo (entity_id,item_id,qt) values(100,'oggetto 1',3);
insert into entities_cargo (entity_id,item_id,qt) values(100,'oggetto 2',3);
Now, here is how you see this rows in CQL3:
cqlsh:goh_master> select * from entities_cargo where entity_id = 100;
entity_id | item_id | qt
-----------+-----------+----
100 | oggetto 1 | 3
100 | oggetto 2 | 3
And how they are if you check tnem from cli:
[default#goh_master] get entities_cargo[100];
=> (column=oggetto 1, value=3, timestamp=1349853780838000)
=> (column=oggetto 2, value=3, timestamp=1349853784172000)
Returned 2 results.
You can access a single column with
select * from entities_cargo where entity_id = 100 and item_id = 'oggetto 1';
Hope it helps
Cassandra still allows using wide rows. This answer references that DataStax blog entry, written after the question was asked, which details the links between CQL and the underlying architecture.
Legacy support
A dynamic column family defined through Thrift with the following command (notice there is no column-specific metadata):
create column family clicks
with key_validation_class = UTF8Type
and comparator = DateType
and default_validation_class = UTF8Type
Here is the exact equivalent in CQL:
CREATE TABLE clicks (
key text,
column1 timestamp,
value text,
PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE
Both of these commands create a wide-row column family that stores records ordered by date.
CQL Extras
In addition, CQL provides the ability to assign labels to the row id, column and value elements to indicate what is being stored. The following, alternative way of defining this same structure in CQL, highlights this feature on DataStax's example - a column family used for storing users' clicks on a website, ordered by time:
CREATE TABLE clicks (
user_id text,
time timestamp,
url text,
PRIMARY KEY (user_id, time)
) WITH COMPACT STORAGE
Notes
a Table in CQL is always mapped to a Column Family in Thrift
the CQL driver uses the first element of the primary key definition as the row key
Composite Columns are used to implement the extra columns that one can define in CQL
using WITH COMPACT STORAGE is not recommended for new designs because it fixes the number of possible columns. In other words, ALTER TABLE ... ADD is not possible on such a table. Just leave it out unless it's absolutely necessary.
interesting, something I didn't know about CQL3. In PlayOrm, the idea is it is a "partial" schema you must define and in the WHERE clause of the select, you can only use stuff that is defined in the partial schema BUT it returns ALL the data of the rows EVEN the data it does not know about....I would expect that CQL should have been doing the same :( I need to look into this now.
thanks,
Dean