cqlsh:create super and sub columns? - cassandra

In cqlsh I want to create 1 super column address. Then below the address I want to create 2 sub columns, permanent and temporary address.
How can I do that using cql shell?

Super columns are obsolete. Try to make sure any documentation, books, or blogs you read are recent.

phact is right, you will want to distance yourself from anything that talks about super columns. The way to solve this with cql (from within cqlsh) is to create address as a user-defined type:
CREATE TYPE address (
street text,
city text,
postal text,
country text
);
Then you could build a table to implement a MAP of the address type.
CREATE TABLE users (
login text PRIMARY KEY,
first_name text,
last_name text,
addresses map<text, frozen <address>>
);
To INSERT values from cqlsh, you could use something like this:
INSERT INTO users (login,first_name,last_name,addresses)
VALUES ('jones','Theora','Jones',{'work':{street:'101 Big Network Drive',city:'New York', postal:'10023',country:'USA'},
'home':{street:'821 Wembley St.',city:'London',postal:'W11 2BQ',country:'GBR'}});

Related

How to create Cassandra primary key in correct way

I have the following table structure:
CREATE TABLE test_keyspace.persons (
id uuid,
country text,
city text,
address text,
phone_number text,
PRIMARY KEY (id, country, address)
);
My main scenario is to get person by id. But sometimes I want to get all cities inside country and all persons inside city as well.
I know that Cassandra must have at least one partition key and zero or more clustering keys, but I don't understand how to organize it to work most effectively (and generally work).
Can anybody give me advice?
So it sounds like you want to be able to query by both id and country. Typically in Cassandra, the way to build your data models is a "one table == one query" approach. In that case, you would have two tables, just keyed differently:
CREATE TABLE test_keyspace.persons_by_id (
id uuid,
country text,
city text,
address text,
phone_number text,
PRIMARY KEY (id));
TBH, you don't really to cluster on country and address, unless a person can have multiple addresses. But a single PK is a completely legit approach.
For the second table:
CREATE TABLE test_keyspace.persons_by_country (
id uuid,
country text,
city text,
address text,
phone_number text,
PRIMARY KEY (country,city,id));
This will allow you to query by country, with persons grouped/sorted by city and sorted by id. In theory, you could also serve the query by id approach here, as long as you also had the country and city. But that might not be possible in your scenario.
Duplicating data in Cassandra (NoSQL) to help queries perform better is ok. The trick becomes keeping the tables in-sync, but you can use the BATCH functionality to apply writes to both tables atomically.
In case you haven't already, you might benefit from DataStax's (free) course on data modeling - Data Modeling with Apache Cassandra and DataStax Enterprise.

Cassandra - Nodejs - Issue while retrieving list type values

for example below is the table structure.
CREATE TABLE table_name(
name text,
id text PRIMARY KEY,
details list<text>
)
Assume
details[0]-> contact number,
details[1]-> Address
I want to write a query to extract contact number from this table.
Actually, you should not store arrays of data. The best and simplest way will be to refactor your database to something like this.
CREATE TABLE table_name(
name text,
id text PRIMARY KEY,
contact number NOT NULL,
address text NOT NULL,
)
Then you could do SELECT contact FROM table_name. If the same address can be reused between multiple entities then you may think about adding one more table Addresses and then using foreign keys to relate this data.

Cassandra dynamic column family

I am new to cassandra and I read some articles about static and dynamic column family.
It is mentioned ,From Cassandra 3 table and column family are same.
I created key space, some tables and inserted data into that table.
CREATE TABLE subscribers(
id uuid,
email text,
first_name text,
last_name text,
PRIMARY KEY(id,email)
);
INSERT INTO subscribers(id,email,first_name,last_name)
VALUES(now(),'Test#123.com','Test1','User1');
INSERT INTO subscribers(id,email,first_name,last_name)
VALUES(now(),'Test2#222.com','Test2','User2');
INSERT INTO subscribers(id,email,first_name,last_name)
VALUES(now(),'Test3#333.com','Test3','User3');
It all seems to work fine.
But what I need is to create a dynamic column family with only data types and no predefined columns.
With insert query I can have different arguments and the table should be inserted.
In articles, it is mentioned ,for dynamic column family, there is no need to create a schema(predefined columns).
I am not sure if this is possible in cassandra or my understanding is wrong.
Let me know if this is possible or not?
if possible Kindly provide with some examples.
Thanks in advance.
I think that articles that you're referring where written in the first years of Cassandra, when it was based on the Thrift protocols. Cassandra Query Language was introduced many years ago, and now it's the way to work with Cassandra - Thrift is deprecated in Cassandra 3.x, and fully removed in the 4.0 (not released yet).
If you really need to have fully dynamic stuff, then you can try to emulate this by using table with columns as maps from text to specific type, like this:
create table abc (
id int primary key,
imap map<text,int>,
tmap map<text,text>,
... more types
);
but you need to be careful - there are limitations and performance effects when using collections, especially if you want to store more then hundreds of elements.
another approach is to store data as individual rows:
create table xxxx (
id int,
col_name text,
ival int,
tval text,
... more types
primary key(id, col_name));
then you can insert individual values as separate columns:
insert into xxxx(id, col_name, ival) values (1, 'col1', 1);
insert into xxxx(id, col_name, tval) values (1, 'col2', 'text');
and select all columns as:
select * from xxxx where id = 1;

Text based selection of records using CQL

I have a text field like 'address' in my Cassandra table. I want to search records on the basis of some piece of text from the 'address' field like city or street name.
for Example: I have address like 'House No. 18, Shehzad Colony, M.D.A. Chowk Lahore'. Here I want to search records having a part of string 'M.D.A. Chowk Lahore' in the address field.
how can i do this using CQL shell. can anyone guide me...
thanks...
There really isn't a way to do this out-of-the-box. In Cassandra, you need to design your tables to fit your query patterns. So if searching for addresses by city (or whatever) is a pattern you need to support, then there are a couple of ways to do this.
You can create a new query table, and partition by city:
CREATE TABLE userAddressesByCity (
userID uuid,
firstName text,
lastName text,
street text,
city text,
province text,
postalCode text,
PRIMARY KEY (city,userID));
This table structure would support querying by city as a partition key, and it also has userID as a clustering key to ensure uniqueness.
If you're working with addresses, a useful technique is to create a User Defined Type (UDT). UDTs are useful if you want to store a user's address in a single column. But you would still want to create a table specifically-designed to serve a query by whichever column you require.
Note: You could try one table and create a secondary index on one of the columns, but secondary indexes perform poorly at-scale, so I don't recommend that.

Cassandra - What is meant by - "cannot rename non primary key part"

I have created a table users as follows:
create table users (user_id text primary key, email text, first_name text, last_name text, session_token int);
I am referring to the CQL help documentation on the DataStax website.
I now want to rename the email column to "emails". But I when I execute the command -
alter table users rename email to emails;
I am getting the error -
Bad Request: cannot rename non primary key part email
I am using CQL 3 . My CQLSH is 3.1.6 and C* is 1.2.8.
Why cannot I rename the above column? If I run help alter table, it shows the option to rename the column. How do I rename the column?
In CQL, you can rename the column used as the primary key, but not any others. This seems opposite from what it should be, one would think that the primary key would need to stay the same and the others would be easy to change! The reason comes from implementation details.
The name of the primary key is not written into each row, rather it is stored in a different place that's easily changeable. But for non-primary key fields, the names of the fields are written into each row. In order to rename the column, the system would have to rewrite every single row.
This article has some fantastic examples and a much longer discussion of Cassandra's internals.
To borrow an example directly from the article, consider this example column family:
cqlsh:test> CREATE TABLE example (
... field1 int PRIMARY KEY,
... field2 int,
... field3 int);
Insert a little data:
cqlsh:test> INSERT INTO example (field1, field2, field3) VALUES ( 1,2,3);
And then the Cassandra-CLI output (not CQLSH) from querying this column family:
[default#test] list example;
-------------------
RowKey: 1
=> (column=, value=, timestamp=1374546754299000)
=> (column=field2, value=00000002, timestamp=1374546754299000)
=> (column=field3, value=00000003, timestamp=1374546754299000)
The name of the primary key, "field1" is not stored in any of the rows, but "field2" and "field3" are written out, so changing those names would require rewriting every row.
So if you really still want to rename a non-primary column, there are basically two different strategies and neither of them are very desirable.
Drop the column and add it back, as another poster mentioned. This has the big downside of dropping all the data in that column.
or
Create a new column family that is basically a copy of the old but with the column in question renamed and rewrite your data there. This is, of course, very computationally expensive.
In order to RENAME the field, the only way I got it working was dropping the field first and then adding it in. So it is like this:
alter table users drop email;
alter table users add emails text;
The main purpose of the RENAME clause is to change the names of CQL 3-generated primary key and column names that are missing from a legacy table (table created with COMPACT STORAGE).

Resources