Can you shard a partitioned table in YugabyteDB? - yugabytedb

[Question posted by a user on YugabyteDB Community Slack]
Can a partitioned table in YugabyteDB YSQL be sharded underneath?

Yes.
If you use the sql on this page, you should be able to see it: https://docs.yugabyte.com/preview/explore/ysql-language-features/advanced-features/partitions/
Example:
yugabyte=# CREATE TABLE order_changes (
yugabyte(# change_date date,
yugabyte(# type text,
yugabyte(# description text
yugabyte(# )
yugabyte-# PARTITION BY RANGE (change_date);
CREATE TABLE
yugabyte=# CREATE TABLE order_changes_2019_02 PARTITION OF order_changes
yugabyte-# FOR VALUES FROM ('2019-02-01') TO ('2019-03-01');
CREATE TABLE
yugabyte=# CREATE TABLE order_changes_2019_03 PARTITION OF order_changes
yugabyte-# FOR VALUES FROM ('2019-03-01') TO ('2019-04-01');
CREATE TABLE
yugabyte=# CREATE TABLE order_changes_2020_11 PARTITION OF order_changes
yugabyte-# FOR VALUES FROM ('2020-11-01') TO ('2020-12-01');
CREATE TABLE
yugabyte=# CREATE TABLE order_changes_2020_12 PARTITION OF order_changes
yugabyte-# FOR VALUES FROM ('2020-12-01') TO ('2021-01-01');
CREATE TABLE
yugabyte=#
yugabyte=# CREATE TABLE order_changes_2021_01 PARTITION OF order_changes
yugabyte-# FOR VALUES FROM ('2021-01-01') TO ('2021-02-01');
CREATE TABLE
yugabyte=#\q
yb-admin --master_addresses 127.0.0.1:7100,127.0.0.2:7100,127.0.0.3:7100 list_tablets ysql.yugabyte order_changes_2019_02
Tablet-UUID Range Leader-IP Leader-UUID
4ad7671bfcdf431eb7a3246ea9fd7480 partition_key_start: "" partition_key_end: "*\252" 127.0.0.2:9100 4d88502883f64cafbff2ba745c57b1fd
2f27e6a5f5fb4722abb23d133052293f partition_key_start: "*\252" partition_key_end: "UT" 127.0.0.1:9100 d321a3d5fe444d78b5c2c8f519b65f1d
2ac7e3c873d74a46af2a80b2eb9589da partition_key_start: "UT" partition_key_end: "\177\376" 127.0.0.1:9100 d321a3d5fe444d78b5c2c8f519b65f1d
795f1351d8664509af4da88744dd1229 partition_key_start: "\177\376" partition_key_end: "\252\250" 127.0.0.3:9100 db2e8e0df9fe4a96966aa6530123984c
4133638d308b419cb0326b442c5c0e86 partition_key_start: "\252\250" partition_key_end: "\325R" 127.0.0.3:9100 db2e8e0df9fe4a96966aa6530123984c
594086be6aa147d592e28270a0f1d220 partition_key_start: "\325R" partition_key_end: "" 127.0.0.2:9100 4d88502883f64cafbff2ba745c57b1fd

Related

Partitioned table inside a colocated database in YugabyteDB YSQL

[Question posted by a user on YugabyteDB Community Slack]
Is it possible to create a partitioned table in a colocated database?
When the database is created with colocated=true and trying to add a partitioned table like this:
create table test(id bigserial not null, PRIMARY KEY(id HASH)) PARTITION BY RANGE WITH (colocated = false);
I’m getting an error
Query 1 ERROR: ERROR: syntax error at or near “WITH” LINE 3: PRIMARY KEY(id HASH)) PARTITION BY RANGE WITH (colocated=t…
Is it possible to do this or should I think about some other approach? I’m trying to do geo-partitioning and at the same time have some of the tables colocated.
The syntax is wrong: you need to specify which columns to PARTITION BY RANGE. For example, PARTITION BY RANGE (id) (but then why is it a hash primary key?)
You can't have a hash partitioned table for colocation. In your case, since the table is partitioned, it should work (as long as you fix the syntax error), but all partitions under it can't be colocated.
Taking into account the above, you can have something like:
create table new (id bigserial not null, PRIMARY KEY (id ASC)) partition by range(id);
create table new_1 partition of new for values from (5) to (10) with (colocated = false);
create table new_2 partition of new for values from (20) to (30) with (colocated = true);
You can't shard by hash if you want to set colocated=true. It works fine with colocated=false:
create table new (id bigserial not null, value text) partition by range(id);
create table new_1 partition of new (primary key(id hash)) for values from (0) to (5) with (colocated = false);
create table new_2 partition of new (primary key(id hash)) for values from (5) to (10) with (colocated = false);

Select specific columns from Cassandra based on their name

I have a cassandra database in which columns can be added or removed based on the application need. The column names start with the prefix RSSI. I was wondering if it is possible to select all columns where the column name is like %RSSI%. In MYSQL you can do something like select count(*) FROM INFORMATION_SCHEMA.COLUMNS WHERE table_name ='MACTrain' AND column_name LIKE '%RSSI%'. Is it possible in cassandra ? If not what can be a solution to select columns based on a specific wildcard.
You can obtain the columns metadata of a table by querying the system keyspace:
select * from system.schema_columns
where keyspace_name = 'yourks' and columnfamily_name = 'yourtable';
For Cassandra v3.0 and above, you can use the new system_schema keyspace:
select * from system_schema.columns
where keyspace_name = 'yourks' and table_name = 'yourtable';

Choosing the right schema for cassandra "table" in CQL3

We are trying to store lots of attributes for a particular profile_id inside a table (using CQL3) and cannot wrap our heads around which approach is the best:
a. create table mytable (profile_id, a1 int, a2 int, a3 int, a4 int ... a3000 int) primary key (profile_id);
OR
b. create MANY tables, eg.
create table mytable_a1(profile_id, value int) primary key (profile_id);
create table mytable_a2(profile_id, value int) primary key (profile_id);
...
create table mytable_a3000(profile_id, value int) primary key (profile_id);
OR
c. create table mytable (profile_id, a_all text) primary key (profile_id);
and just store 3000 "columns" inside a_all, like:
insert into mytable (profile_id, a_all) values (1, "a1:1,a2:5,a3:55, .... a3000:5");
OR
d. none of the above
The type of query we would be running on this table:
select * from mytable where profile_id in (1,2,3,4,5423,44)
We tried the first approach and the queries keep timing out and sometimes even kill cassandra nodes.
The answer would be to use a clustering column. A clustering column allows you to create dynamic columns that you could use to hold the attribute name (col name) and it's value (col value).
The table would be
create table mytable (
profile_id text,
attr_name text,
attr_value int,
PRIMARY KEY(profile_id, attr_name)
)
This allows you to add inserts like
insert into mytable (profile_id, attr_name, attr_value) values ('131', 'a1', 3);
insert into mytable (profile_id, attr_name, attr_value) values ('131', 'a2', 1031);
.....
insert into mytable (profile_id, attr_name, attr_value) values ('131', 'an', 2);
This would be the optimal solution.
Because you then want to do the following
'The type of query we would be running on this table: select * from mytable where profile_id in (1,2,3,4,5423,44)'
This would require 6 queries under the hood but cassandra should be able to do this in no time especially if you have a multi node cluster.
Also if you use the DataStax Java Driver you can run this requests asynchronously and concurrently on your cluster.
For more on data modelling and the DataStax Java Driver check out DataStax's free online training. Its worth a look
http://www.datastax.com/what-we-offer/products-services/training/virtual-training
Hope it helps.

how can insert data in hbase through hive table?

I can create a Hive table with this query
CREATE TABLE hbtable(key int, value string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
TBLPROPERTIES ("hbase.table.name" = "xyz");
And I used this query for inserting data into the table but it's not working,
insert overwrite table hbtable select * from hbtable s where s:hive fiels="value"
How can I insert values into a HBase table through Hive table?
Follow these steps,
Step 1 :
bin/hive --auxpath /hadoop/projects/hive-0.9.0/lib/hive-hbase-handler-0.9.0.jar,/hadoop/projects/hive-0.9.0/lib/hbase-0.92.0.jar,/hadoop/projects/hive-0.9.0/lib/zookeeper-3.3.4.jar,/hadoop/projects/hive-0.9.0/lib/guava-r09.jar -hiveconf hbase.master=localhost:60000
STep 2 :
hive> CREATE TABLE hbase_table_1(key int, value string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
TBLPROPERTIES ("hbase.table.name" = "xyz");
Step 3 :
hive> INSERT OVERWRITE TABLE hbase_table_1 SELECT * FROM xyz WHERE key=1;
Note : I am running hive-0.9.0 and hbase-0.94.4 on a single Ubuntu box.

Extra column created by CQL inserts (comparing to cli)

I see extra column being created in my column family when I use cql comparing to cli.
Create table using CQL and insert row:
cqlsh:cassandraSample> CREATE TABLE bedbugs(
... id varchar,
... name varchar,
... description varchar,
... primary key(id, name)
... ) ;
cqlsh:cassandraSample> insert into bedbugs (id, name, description)
values ('Cimex','Cimex lectularius','http://en.wikipedia.org/wiki/Bed_bug');
Now insert column using cli:
[default#cassandraSample] set bedbugs['BatBedBug']['C. pipistrelli:description']='google.com';
Value inserted.
Elapsed time: 1.82 msec(s).
[default#cassandraSample] list bedbugs
... ;
Using default limit of 100
Using default column limit of 100
-------------------
RowKey: Cimex
=> (column=Cimex lectularius:, value=, timestamp=1369682957658000)
=> (column=Cimex lectularius:description, value=http://en.wikipedia.org/wiki/Bed_bug, timestamp=1369682957658000)
-------------------
RowKey: BatBedBug
=> (column=C. pipistrelli:description, value=google.com, timestamp=1369688651442000)
2 Rows Returned.
cqlsh:cassandraSample> select * from bedbugs;
id | name | description
-----------+-------------------+--------------------------------------
Cimex | Cimex lectularius | http://en.wikipedia.org/wiki/Bed_bug
BatBedBug | C. pipistrelli | google.com
So, cql creates one extra column for each row, with empty non-primary key columns. Isn't it waste of space?
When you created a column family using CQLSh and specified primary key(Id, name) you make cassandra create two indices of the data stored one for data sorted by ID and the other for data sorted by name. but when you do this by cassandra-cli your column family doesn't have the index column. cassandra-cli doesn't support having secondary indexes. I hope I made sense to you I lack words to explain my understanding.
For compatibility with cassandra-cli and to prevent this extra column from being created, change your create table statement to include "WITH COMPACT STORAGE".
described here
So
CREATE TABLE bedbugs(
id varchar,
name varchar,
description varchar,
primary key(id, name)
);
becomes
CREATE TABLE bedbugs(
id varchar,
name varchar,
description varchar,
primary key(id, name)
) WITH COMPACT STORAGE;
WITH COMPACT STORAGE is also how you would go about supporting wide rows in cql.

Resources