Im interested in studying columnar store in memsql. Im trying to create columnar tables. The query I used is,
CREATE TABLE students (
stud_id INT,
stud_group INT,
joining_date DATETIME,
KEY (`stud_group`) USING CLUSTERED COLUMNSTORE
);
But the query throws me error at clustered columnstore. I don't know what leads to this error.
The reason is the comma after COLUMNSTORE. It should be
CREATE TABLE students (
stud_id INT,
stud_group INT,
joining_date DATETIME,
KEY (`stud_group`) USING CLUSTERED COLUMNSTORE
);
UPDATE: apparently not. Then the only reason it can happen is if you are using old version of MemSQL (before 4.0).
To see the version of MemSQL, run SELECT ##memsql_version.
Related
For my chat table design in cassandra I have the following scheme:
USE zwoop_chat
CREATE TABLE IF NOT EXISTS public_messages (
chatRoomId text,
date timestamp,
fromUserId text,
fromUserNickName text,
message text,
PRIMARY KEY ((chatRoomId, fromUserId), date)
) WITH CLUSTERING ORDER BY (date ASC);
The following query:
SELECT * FROM public_messages WHERE chatroomid=? LIMIT 20
Results in the typical message:
Cannot execute this query as it might involve data filtering and thus
may have unpredictable performance. If you want to execute this query
despite the performance unpredictability, use ALLOW FILTERING;
Obviously I'm doing something wrong with the partitioning here.
I'm not experienced with Cassandra and a bit confused about online suggestions that Cassandra will make an entire table scan, which is something that I don't really get realistically. Why would I want to fetch an entire table.
Another suggestion I read about is to create partitioning, e.g. to fetch the latest per day. But this doesn't work for me. You don't know when the latest chat message occurred.
Could be last day, last hour, or last week or month for that matter.
I'm pretty much used to sql or nosql like mongo, but this simple use case seems to be a problem for Cassandra. So what is the recommended approach here?
Edit:
It seems that it is common practise to add a bucket integer.
Let's say I create a bucket per 50 messages, is there a way to auto-increment it when the bucket is full?
I would prefer not having to do a fetch of MAX bucket and calculate when the bucket is full. Seems like bad performance for doing inserts.
Also it seems like a bad idea to manage the buckets in Java. Things like app restarts or load balancing would require extra logic.
(I currently use Java Spring JPA for Cassandra).
It works without bucketing using the following table design:
USE zwoop_chat
CREATE TABLE IF NOT EXISTS public_messages (
chatRoomId text,
date timestamp,
fromUserId text,
fromUserNickName text,
message text,
PRIMARY KEY ((chatRoomId), date)
) WITH CLUSTERING ORDER BY (date DESC);
I had to remove the fromUserId from the partition key, I assume it is required to include it in the where clause to avoid the error.
The jpa query:
publicMessageRepository.findFirst20ByPkChatRoomIdOrderByPkDateDesc(chatRoomId);
I am trying to design the application log table in Cassandra,
CREATE TABLE log(
yyyymmdd varchar,
created timeuuid,
logMessage text,
module text,
PRIMARY KEY(yyyymmdd, created)
);
Now when I try to perform the following queries it is working as expected,
select * from log where yymmdd = '20182302' LIMIT 50;
Above query is without grouping, kind of global.
Currently I did an secondary index for 'module' so I am able to perform the following,
select * from log where yymmdd = '20182302' WHERE module LIKE 'test' LIMIT 50;
Now my concern is without doing the secondary index, Is there an efficient way to query based on the module and fetch the data (or) Is there a better design?
Also let me know the performance issue in current design.
For fetching based on module and date, you can only use another table, like this:
CREATE TABLE module_log(
yyyymmdd varchar,
created timeuuid,
logMessage text,
module text,
PRIMARY KEY((module,yyyymmdd), created)
);
This will allow to have single partition for every combination of the module & yyyymmdd values, so you won't have very wide partitions.
Also, take into account that if you created a secondary index only on module field - you may get problems with too big partitions (I assume that you have very limited number of module values?).
P.S. Are you using pure Cassandra, or DSE?
Is it possible to create a Cassandra table with GeoMesa specifying keys (ie - a composite key)? I have a spark job that writes to Cassandra and a composite key is necessary for the output table. I would now like to create/write that same table somehow through the GeoMesa api instead of directly to Cassandra. The format is like this:
CREATE TABLE IF NOT EXISTS mykeyspace.testcompkey (pkey1 text, ckey1 int, attr1 int, attr2 int, minlat decimal, minlong decimal, maxlat decimal, maxlong decimal, updatetime text, PRIMARY KEY((pkey1), ckey1) )
Is this possible? You can see also in the create table statement that I have a partition key and a clustering key. From what I have read, I believe Geoserver does support both Simple and Complex features. I am just wondering if that support also maps over into the realm of Cassandra with GeoMesa?
Thank you
GeoMesa does use composite partition and clustering keys for Cassandra tables, but the keys are not configurable by the user - they are designed to facilitate spatial/temporal/attribute CQL queries.
Keys can be seen in the index table implementations here. The columns field (for example here) defines the primary keys. Columns with partition = true are used for partitioning, the rest are used for clustering.
i new for use apache cassandra, i have install cassandra and use cqlsh in my laptop
i used to create table using :
create table userpageview( created_at timestamp, hit int, userid int, variantid int, primary key (created_at, hit, userid, variantid) );
and insert several data into table, but when i tried to select using condition for all column (i mean one by one) it's error
maybe my data modelling wrong, maybe anyone can tell me how create data modelling in cassandra
thx
You need to read about partition keys and clustering keys. Cassandra works much differently than relational databases and the types of queries you can do are much more restricted.
Some information to get you started: here and here.
We currently have a table defined as below
create table tableA(id int,
seqno int,
data text,
PRIMARY KEY((id), seqno))
WITH CLUSTERING ORDER BY (seqno DESC);
We need to update the type for the id column from int to text. We are wondering out of the two approaches, would be the most advisable.
ALTER TABLE tableA ALTER id TYPE varchar; (the command succeeds but then we have issues reading the data. Is this because the ALTER table doesn't update the underlying storage of the id column?)
COPY to/from oldtable/newtable. This works but we have issues with the RPC timeout (which we can change), but is this a bad idea on a table across a cluster?
We have checked the online docs and these are only 2 options we can find around this. are there other options??
Thanks
Paul
I would say option 1 isn't really supported. If your integers don't map to actual strings you're going to have problem, you're probably seeing key validation errors.
for option 2 you probably just need to copy smaller chunks of data for each read/write.