MemSQL - create table as select as CLUSTERED COLUMNSTORE - singlestore

Is it possible in MemSQL to create a table using "create table xxx as select ..." with the created table xxx to be disk based (i.e. CLUSTERED COLUMNSTORE)?
I was not able to achieve this.
Thank you!

You can specify options for the table including indexes like so:
create table c (key using clustered columnstore (a)) as select a,b from t;
for more info see http://docs.memsql.com/docs/create-table#section-create-table-select

Related

Cassandra : (3.11.11) find a string in the cassandra table column

I am a new bee to Cassandra.
I have a Table(table1) and the Data like
ch1,ch2,ch3,ch4
LD,9813970,1484914,'T03103','T04014'
LD,1008203,1486104,'T03103','T04024'
Want to find a string in this cassandra table : table1. Is there any option to search a given string in this table's column ch4 using only IN operator (not LIKE operator). Sample query is like
select * from table1 where 'T04014' IN (ch4)
if required ch4 column may included in the partition or clustering keys.
You didn't post the table schema so I'm going to assume that ch4 is not part of the primary key.
You cannot include a column in the filter unless it is part of the primary key or you have a secondary index defined on it. Be aware that secondary indexes are not always a good fit. Have a look at when to use an index for details.
The general recommendation is to denormalise and create a table specifically designed for each app query so you get the best performance out of your cluster. Cheers!

is it possible to shard Vistess with the secondary sharding Key

We are using Vitess database to scale and achieve Horizontal Sharding in mysql. is it possible to do the secondary shard in Vitess.
For eg:
Table 1 - Agency
(
AgencyID INT,
CreatedOn DATETIME
)
Table 2 - PayrollDetails
(
AgencyID INT FOREIGN KEY TO Agency Table,
PayrollID INT,
PayrollCreatedOn DATETIME
)
Now We sharded both the tables with AgencyID as a Sharding Key. but PayrollDetails table is very huge and it has more then 100 million of records. So now we are planning to shard PayrollDetails table again with the PayrollCreatedOn field and Primary Shard for both the tables should be with the Agency Key but payrollDetails table should shard with the both AgencyID and PayrollCreatedOn.How can we achieve it in Vitess?
Conceptually, the sharding key (primary vindex) is used to decide which shard a row goes to. So, it's not possible to have two sharding keys because they would dictate conflicting locations for the row.
If I understand correctly, you want to query the table using PayrollCreatedOn in the where clause, you can create a secondary Vindex. This will create a lookup table that points at where the row lives, and Vitess can exploit that. There's an explanation for this here: https://vitess.io/docs/reference/vindexes/. There is a new command called CreateLookupVindex that is capable of backfilling this lookup table. It's yet to be documented, though.
Vitess also lets you "materialize" a table by using a different primary vindex. In that case, the second table will be a real-time copy of the first table, but sharded differently. You can see a demo for this on the vitess front page (scroll down to the video).

Creating secondary index on table in Cassandra

I have just started working on Cassandra.
I am bit confuse with the concept of secondary key.
From the definition I understood is indexing on the non key attribute of a table which is not sorted is secondary index.
So I have this table
CREATE TABLE IF NOT EXISTS userschema.user (id int,name text, address text, company text, PRIMARY KEY (id, name))
So If I create index like this
CREATE INDEX IF NOT EXISTS user_name_index ON userschema.user (name)
this should be secondary index.
But my requirement is to create index containing columns name , id , company.
How can I create a secondary index like this in Cassandra ?
I got this link which defines something of this short, but how come are these secondary indexes aren't they just table ?
These above user table is just the example not the actual one.
I am using Cassandra 3.0.9
id and name are already part of primary key.
So following queries will work
SELECT * FROM table WHERE id=1
SELECT * FROM table WHERE id=1 and name='some value'
SELECT * FROM table WHERE name='some value' ALLOW FILTERING (This is inefficeint)
You can create secondary index on company column
CREATE INDEX IF NOT EXISTS company_index ON userschema.user (company)
Now once secondary index is defined, it can be used in where clause along with primary key.
SELECT * FROM table WHERE id=1 and name='some value' and company='some value'
Though SELECT * FROM table WHERE company='some value' ALLOW FILTERING works it will be highly inefficient.
Before creating secondary index have look at When to use secondary index in cassandra
The link which you have referred mainly focuses on materialized views, in which we create virtual tables to execute the queries with non-primary keys.
Moreover, it seems you are creating secondary key on a Primary Key, which you have already defined in the creation of the table. Always remember that Secondary Index should be Non-Primary key.
To have a clear idea about the Secondary Indexes- Refer this https://docs.datastax.com/en/cql/3.3/cql/cql_using/useSecondaryIndex.html
Now, Pros and cons of the alternative methods for the secondary index
1.Materialized views:
It will create new virtual tables and you should run the queries in a virtual table using the old Primary keys in old and original tables and new virtual Primary keys in the new materialized table. Any changes in data modification in the original old table will be reflected at materialized table. If you drop the materialized table, but the data will be created as tombstones whose gcc_graceseconds is 864000(10 days) default. Dropping the materialized table will not have any effect on original table.
2.ALLOW FILTERING:
It is highly inefficient and is not at all advised to use allow filtering as the latencies will be high and performance will be degraded.
If you want much more information, refer this link too How do secondary indexes work in Cassandra?
Correct me if I am wrong

How to change PARTITION KEY column in Cassandra?

Suppose we have such table:
create table users (
id text,
roles set<text>,
PRIMARY KEY ((id))
);
I want all the values of this table to be stored on the same Cassandra node (OK, not really the same, same 3, but have all the data mirrored, but you got the point), so to achieve that i want to change this table to be like this:
create table users_v2 (
partition int,
id text,
roles set<text>,
PRIMARY KEY ((partition), id)
);
How can i do that without losing the data from the first table?
It seems to be impossible to ALTER TABLE in order to add such column. i'm OK with that.
What i try to do is to copy data from the first table and insert to the second table.
When i do it as it is, the partition column іs missing, which is expected.
I can ALTER the first table and add a 'partition' column to the end, and then COPY in correct order, but i can't update all the rows in the first table to set the all some partition, and it seems to be no "default" value when column is added.
You simply cannot alter the primary key of a Cassandra table. You need to create another table with your new schema and perform a data migration. I would suggest that you use Spark for that since it is really easy to do a migration between two tables with only a few lines of code.
This also answer to the alter primary key question.
If you have not a lot of data in table there is another way.
In utility "DataStax Dev Center", select table and use command "Export All result to file as INSERT". It will save all data from table to file with Insert CQL-instructions.
Then you should drop table, create new one with new PARTITION KEY and finally fill it by instructions from file via CQL.

Cassandra Data Modelling approach

I have below initially designed static column family in cassandra
create table APP_DATA (
CODE varchar,
DATA varchar,
CREATED_DT timestamp,
REQUEST_TYPE int,
STATUS int,
..... #Some more columns ...,
PRIMARY KEY ((CODE,DATA),CREATED_DT))
with clustering order by (CREATED_DT desc);
Now, I want to query the below
1)SELECT
SELECT * FROM APP_DATA WHERE CODE='1' AND DATA='1111111111';
SELECT * FROM APP_DATA WHERE CODE='1' AND DATA='1111111111' AND CREATED_DT<=dateof(now()) AND STATUS=0;
SELECT * FROM APP_DATA WHERE CODE='1' AND DATA='1111111111' AND CREATED_DT<=dateof(now()) AND STATUS=0 AND REQUEST_TYPE=9;
2)DELETE
DELETE FROM APP_DATA WHERE CREATED_DT+5<=sysdate;
How should I proceed with data modeling ?
How should I design to make the above select and delete queries faster ?
Please guide ..
Thanks in Advance.
Hi First of all take CREATED_DT column out of PRIMARY KEY, and left with two column in PRIMARY KEY. Make CREATED_DT as a normal column and create secondary indexes to query.
Second to delete the data which is older than five days (CREATED_DT+5 <= sysdate) use TTL (Time to live) feature of Cassandra.
I hope it could help you.
Here is the thing. I thing your table looks good and you do need to take out CREATED_DT out of the primary key because you are grouping by it as DESC. And, in order to do that you have to make it a clustering column.
Secondly Cassandra practices is a query driven methodology. Meaning you create a table to satisfy a query. Try to avoid creating Secondary indexes as much as you can and create tables instead to satisfy the query.
You DML should be based on partition key.

Resources