Can we filter Columns in Select query using Datastax Accessor? - cassandra

With Datastax java driver for cassandrausing #Accessor for select query, Can we filter columns in query itself instead of select * always
All the examples and docs I see select all the columns.
For example
Select column_A from table

As stated in this example you have query under your hand. There is a * and you can change it to whatever you like. So you can change * to column_A only but pay attention that class under mapping must have same structure. Of course it must be valid CQL query with all partition and clustering columns.
So let's say that you need only first_name of User class. You would create User class with only first_name property and Accessor which is using mapped class under the hood will map only first_name property and you will get list of User objects with single property.
Good way to do it if you sometimes need full object and sometimes only handful of properties would be to create some kind of light mapper objects with subset of properties and use it where needed.

Related

JOOQ join two tables with same column names

I'm using running following JOOQ query:
dsl.select().from(table).join(joinTable).on(joinCondition).where(condition).fetchInto(entityClass);
Both table and joinTable have id as primary key name, but resulting fetch into entity class contains id of joinTable, and rest of columns of table. If I reorder tables, result is similar, I have ID from table and rest of columns from joinTable.
Metamodel is generated using jooq-codegen-maven plugin.
Interesting, I found out that using:
dsl.select(table.fields())...
solves problem. I would expect that this behavior is by default, but it is not.

how Cql's Collection contains alternative value?

I have a question to query to cassandra collection.
I want to make a query that work with collection search.
CREATE TABLE rd_db.test1 (
testcol3 frozen<set<text>> PRIMARY KEY,
testcol1 text,
testcol2 int
)
table structure is this...
and
this is the table contents.
in this situation, I want to make a cql query has alternative option values on set column.
if it is sql and testcol3 isn't collection,
select * from rd.db.test1 where testcol3 = 4 or testcol3 = 5
but it is cql and collection.. I try
select * from test1 where testcol3 contains '4' OR testcol3 contains '5' ALLOW FILTERING ;
select * from test1 where testcol3 IN ('4','5') ALLOW FILTERING ;
but this two query didn't work...
please help...
This won't work for you for multiple reasons:
there is no OR operation in CQL
you can do only full match on the value of partition key (testcol3)
although you may create secondary indexes for fields with collection type, it's impossible to create an index for values of partition key
You need to change data model, but you need to know the queries that you're executing in advance. From brief looking into your data model, I would suggest to rollout the set field into multiple rows, with individual fields corresponding individual partitions.
But I want to suggest to take DS201 & DS220 courses on DataStax Academy site for better understanding how Cassandra works, and how to model data for it.

Cassandra : Using output of one query as input to another query

I have two tables one is users and other is expired_users.
users columns-> id, name, age
expired_users columns -> id, name
I want to execute the following query.
delete from users where id in (select id from expired_users);
This query works fine with SQL related databases. I want find a solution to solve this in cassandra.
PS: I don't want to add any extra columns in the tables.
While designing cassandra data model, we cannot think exactly like RDBMS .
Design like this --
create table users (
id int,
name text,
age int,
expired boolean static,
primary key (id,name)
);
To mark a user as expired -- Just insert the same row again
insert into users (id,name,age,expired) values (100,'xyz',80,true);
you don't have to update or delete the row, just insert it again, previous column values will get overridden.
What you want to is to use join as a filter for your delete statement, and this is not what the Cassandra model is built for.
AFAIK there is no way to perform this using cql. If you want to perform this action without changing the schema - run external script in any language that has drivers for Cassandra.

Understanding Cassandra Data Model

I have recently started learning No-SQL and Cassandra through this article. The author explains the data model through this diagram:
The author also gives the below column family example:
Book {
key: 9352130677{ name: “Hadoop The Definitive Guide”, author:” Tom White”, publisher:”Oreilly”, priceInr;650, category: “hadoop”, edition:4},
key: 8177228137{ name”” Hadoop in Action”, author: “Chuck Lam”, publisher:”manning”, priceInr;590, category: “hadoop”},
key: 8177228137{ name:” Cassandra: The Definitive Guide”, author: “Eben Hewitt”, publisher:” Oreilly”, priceInr:600, category: “cassandra”},
}
But in that tutorial and every other tutorial I have gone through, then end up creating regular tables in cassandra. I am unable to connect the Cassandar model with what I am creating.
For example, I created a column family called Employee as below:
create columnfamily Employee(empid int primary key,empName text,age int);
Now I inserted some data and my column family looks as this:
For me this looks like a regular relational table and not like the data model the author has explained. How do I create a Employee column family where each row represents an employee with different attributes? Something like:
Employee{
101:{name:Emp1,age:20}
102:{name:Emp2,salary:1000}
102:{manager_name:Emp3,age:45}
}
}
You need to understand that in the representation using cql, is may look like regular relational table, but the internal structure of the rows in Cassandra is completely different. It is saving different set of attributes for each employee, and the nulls you can see while querying with cql is just a representation of empty/nonexistent cells.
What you trying to achieve, is unstructured data model. Cassandra started with this model, and all was working as described in the tutorial you've read, but there is an opinion that unstructured data design is unhealthy to development and makes more problems than it solves. So, after sometime, Cassandra moved to the "structured" data structure (and from thrift to cql). It doesn't mean that you have to store all attributes for all keys/rows, it doesn't mean that all the rows are have same number of attributes, it just means that you have to declare attributes before you use them.
You can achieve some kind of unstructured data modeling using Map, List, Set, etc. data types, UDT (User defined types) or just saving your data as json string and parsing it on the application side.
What you have understood is correct. Just believe it. Internally cassandra stores columns exactly like the image in your question.
Now, what you are expecting is to insert a column which is not defined while creating the Employee table. For dynamic columns, you can always use Map data types .
For example
create table Employee(
empid int primary key,
empName text,
age int,
attributes Map<text,text>);
To add new attributes you can use below queries.
UPDATE Employee SET attributes = { manager_name : Emp3, age:45 } WHERE empid = 102;
Update -
another way to to create a dynamic column model is as below
create table Employee(
empid int primary key,
empName text,
attribute text,
attributevalue text,
primary key (empid,empName,attribute)
);
Lets take few inserts -
insert into Employee (empid,empName,attribute,attributevalue) values (102,'Emp1','age','25') ;
insert into Employee (empid,empName,attribute,attributevalue) values (102,'Emp1','manager','emp2') ;
insert into Employee (empid,empName,attribute,attributevalue) values (102,'Emp1','department','hr') ;
this data structure will create a wide row, and behaves like dynamic column. you can see primary key empid and name is common for all three rows, only attribute and value will change.
Hope this will help
Cassandra uses a special primary key called compositie key. This is the representation of the partitions. This is also one reason why cassandra scales well. The composite key is used to determine the nodes on which the rows are stored.
The result in your console may be a result set of rows, but the intern organization of cassandra is differnt from that. Have you ever tried to query a table without an primary key? You will quickly see that you can't query that flexible (because of the partitioning).
After that you will understand why we have to use a query-first design aproach for cassandra. This is completely different from RDBBS.

Cassandra Hierachy Data Model

I'm newbie design cassandra data model and I need some help to think out the box.
Basically I need a hierarchical table, something pretty standard when talking about Employee.
You have a employee, say Big Boss, that have a list of employee under him.
Something like:
create table employee(id timeuuid, name text, employees list<employee>, primary key(id));
So, is there a way to model a hierarchical model in Cassandra adding the table type itself, or even another approach?
When trying this line above it give me
Bad Request: line 1:61 no viable alternative at input 'employee'
EDITED
I was thinking about 2 possibilities:
Add an uuid instead and in my java application find each uuid Employee when bringing up the "boss".
Working with Map, where the uuid is the id itself and my text would be the entire Row, then in my java application get the maps, convert each "text" employee into a Employee entity and finally return the whole object;
It really depends on your queries...one particular model would only be good for a set of queries, but not others.
You can store ids, and look them up again at the client side. This means n extra queries for each "query". This may or may not be a problem, as queries that hit a partition are fast. Using a map from id to name is also an option. This means you do extra work and denormalise the names into the map values. That's also valid. A third option is to use a UDT (user defined type). You could then have a list or set or even map. In cassandra 2.1, you could index the map keys/ values as well, allowing for some quite flexible querying.
https://www.datastax.com/documentation/cql/3.1/cql/cql_using/cqlUseUDT.html
One more approach could be to store a person's details as id, static columns for their attributes, and have "children" as columns in wide row format.
This could look like
create table person(
id int primary key,
name text static,
age int static,
employees map<int, employeeudt>
);
http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/refStaticCol.html
Querying this will give you rows with the static properties repeated, but on disk, it's still held once. You can resolve the rest client side.

Resources