I am struggling for some time with performing a group by.
After seeing all examples, I am still not able to perform a correct group by for some reason.
My table is created like this:
CREATE TABLE IF NOT EXISTS itvance.identityuserdata (
id TEXT,
type TEXT,
tenantId TEXT,
version INT,
data JSONB,
PRIMARY KEY((id, type, tenantId), version))
WITH transactions ={'enabled': true }
Please use below query,
select id, type, tenantid, version
from identityuserdata group by id, type, tenantid, version;
But, since you are not aggregating any thing, you not need to use Group By. You can just use Distinct clause
select distinct id, type, tenantid, version
from identityuserdata group by id, type, tenantid, version;
Related
In Cassandra, I understand that tables are supposed to be created according to what needs to be queried. For example, I currently have a Users and Users_By_Status table.
##Users##
CREATE TABLE Users (
user_id uuid,
name text,
password text,
status int,
username text,
PRIMARY KEY (user_id)
);
CREATE INDEX user_username_idx ON Users (username);
##Users_By_Status##
CREATE TABLE Users_By_Status (
username text,
status int,
user_id uuid,
PRIMARY KEY (username, status, user_id)
);
In this case, if a user leaves, their record won't be deleted. Instead, status will be changed from 1 to 0.
If I insert data into the Users table, do I need to manually insert the data into Users_By_Status table too? What happens if I update the status in Users? Do I need to manually update the record in Users_By_Status table too?
I have a feeling I'm understanding Cassandra wrongly. Appreciate all the help I can get.
Shortly answer: yes, in your case you need to delete manually.
In cassandra db you need to write more code in your app layer to handle cenarios like that.
But we have other options like materialized view or BATCH Statements.
For your solution, i think that materialized view is the best option. You can create a Materialized view from your table Users. Like this:
CREATE MATERIALIZED VIEW Users_By_Status
AS SELECT username, status, userid
FROM Users
PRIMARY KEY(username, status, userid);
And yes, when you update table users, the update will happen in the Materialized View Users_By_Status too.
Reference: https://docs.datastax.com/en/cql-oss/3.3/cql/cql_using/useCreateMV.html
Do I need to manually update the record in Users_By_Status table too?
So CoutinhoBR alluded to it, but I'll come right out and say it. You cannot update primary key values in Cassandra. So that's where a DELETE is required to get the old status value out of there, and then a write for the new one.
When I tried to retrieve table using contains keyword it prompts "Cannot use CONTAINS relation on non collection column col1" but when I tried to create table using
CREATE TABLE test (id int,address map<text, int>,mail list<text>,phone set<int>,primary key (id,address,mail,phone));
it prompts "Invalid collection type for PRIMARY KEY component phone"
One of the basics in Cassandra is that you can't modify primary keys. Always keep that in mind.
You can't use a collection as primary key unless it is frozen, meaning you can't modify it.
This will work
CREATE TABLE test (id int,address frozen<map<text, int>>,mail frozen<list<text>>,phone frozen<set<int>>,primary key (id,address,mail,phone));;
However, I think you should take a look at this document: http://www.datastax.com/dev/blog/cql-in-2-1
You can put secondary indexes on collections after cql 2.1. You may want to use that functionality.
I have a item such as:
"Item": {
"model": 32590038899,
"date": 10-09-2015,
"price":100
}
"hash_key_attribute_name": "model"
"range_key_attribute_name": "date"
My problem is that new items gets inserted a lot so there is chance that items with same model number may come and they may not be same product.This may due to regions where the product is available.So i need a setup where i need to keep a copy of the product, if such a case arises and in future after inspection or requirement i can bring back the item. I am looking for a kind of version system. Currently the product gets deleted due to same primary key.
Put simply: if your primary key isn't unique, then you aren't using the correct field as your primary key.
...there is chance that items with same model number may come and they
may not be same product.This may due to regions where the product is
available...
It sounds like your primary key needs to be a composite of region and product ID. Or perhaps you need a separate table for each region.
I would add a unique id attribute to your item, a uuid works well, such as
{ id: "53f382ae-94b8-4910-b8f1-384f46dc10d8",
model: 32590038899,
date: "10-09-2015",
price:100
}
Change your table schema to have just a hash key attribute name - id.
Add a global secondary index with hash key - model, range key - date.
Global secondary indexes can contain items where the primary keys collide. With this schema you will prevent items from being overwritten and you can query for items with a given model number.
I'm trying to provide a service for user validation of table structures, one component of which is column data type, like uuid, text, and bigint in the `CREATE TABLE' statement below.
USE my_keyspace;
CREATE TABLE users (
id uuid,
name text,
age bigint);
If I do
USE system;
SELECT validator FROM schema_columns
WHERE keyspace_name='my_keyspace' AND columnfamily_name='users';
I get
org.apache.cassandra.db.marshal.UUIDType
org.apache.cassandra.db.marshal.UTF8Type
org.apache.cassandra.db.marshal.LongType
Which seems informative, but on closer inspection, multiple distinct datatypes can map to the same validator value. Is there a way I can pull the data type info as entered in the `CREATE TABLE' statement, or at least find some distinction between the types?
Also, I'm curious as to why the validator data has the 'org.apache.cassandra...' prepended to it, and couldn't find an explanation, so if anybody knows why that is, I'd be very interested to know.
Which seems informative, but on closer inspection, multiple distinct datatypes can map to the same validator value.
If this is the case, as for example with varchar and text, I believe that the data types map on one another and are interchangeable. Anyone else correct me if I am wrong.
Is there a way I can pull the data type info as entered in the `CREATE TABLE' statement, or at least find some distinction between the types?
The only way I know would be:
DESC TABLE users;
Also, I'm curious as to why the validator data has the 'org.apache.cassandra...' prepended to it, and couldn't find an explanation, so if anybody knows why that is, I'd be very interested to know.
Cassandra is implemented in Java and this is the full path to the Class that implements the data type.
More info:
http://docs.datastax.com/en/cql/3.0/cql/cql_reference/cql_data_types_c.html
https://github.com/apache/cassandra/tree/trunk/src/java/org/apache/cassandra/db/marshal
Use following query:
select column_name,type from system_schema.columns where keyspace_name ='my_keyspace' AND table_name='users';
Here is a cassandra table:
CREATE TABLE Account(
id uuid,
userRef uuid,
name map<text, text>,
dataStatus text,
dataVisibility text,
...
PRIMARY KEY( id, dataStatus, dataVisibility, userRef)
)
CREATE INDEX idx_xxx_account_name ON Account (name);
'name' is a cql3 column of (collection) type 'map'. My question is: is it possible to create secondary index on a map type, i.e., name?
Thanks.
As of Cassandra 1.2.6, custom indexes on collections are supported.
https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/cassandra-1.2.6
Your question is quite old. In cassandra 2.1 valid syntax is
CREATE INDEX on Account(keys(name));
No response? I have decided to rewrite the table as follows:
CREATE TABLE Account(
id uuid,
userRef uuid,
**main_name text,**
**other_name map<text, text>,**
dataStatus text,
dataVisibility text,
...
PRIMARY KEY( id, dataStatus, dataVisibility, userRef)
)
CREATE INDEX idx_xxx_account_name ON Account (main_name);
*_name could be anything e.g., email, phone etc. For example, a main_name could be the mandatory, whereas other_name could be optional.
Anyway now I can index main_name as a 'text' type instead of the map of text values.
To answer your initial question:
There is no support for secondary indexes on collections yet. Concretely, you could associate a set of tags to a user, but you cannot automatically index users by their tags yet. Adding that support is definitively on the roadmap but remains to be implemented.
Coming in 1.2: Collections support in CQL3
Also, I don't quite see why you use a map? Why not a simple set or list? Have a look at the reference provided below.
create index idx_name on Account(ENTRIES(name))
this is for access the rows with particular entry in map.