Vitess Sharding issue - vitess

I am new to the Vitess and we have deployed Vitess in Kubernetes using Helm Charts and exposed VTGate through node Port and able to connect MYSQL Work bench using the exposed VTGate Ip Address.
When we inserted the records using MySql Work Bench, Records are not scattering across the Shards.
* 'Payrolls' is the KeySpace and we created two shards as Payrolls -80,Payrolls 80-.
* Table Schema as below
1. TenantList
(
TenantID INT,
NAme VARCHAR(200)
)
2. PayrollMaster
(
PayrollID INT PRIMARY KEY,
HHAID INT FOREIGN KEY TO TenantList(TenantID)
)
3. PayrollDetails
(
PayrollDetailID INT PRIMARY KEY,
PayrollID INT FOREIGN KEY TO PayrollMaster(PayrollID),
HHAID INT FOREIGN KEY TO TenantList(TenantID)
)
* VSchema is as below
{
"sharded": true,
"vindexes": {
"hash": {
"type": "hash"
}
},
"tables": {
"payrollmaster": {
"column_vindexes":
[
{
"column": "HHA",
"name": "hash"
}
],
"auto_increment": {
"column": "PayrollInternalID",
"sequence": "payrollmaster_seq"
}
},
"payrolldetails": {
"column_vindexes":
[
{
"column": "HHA",
"name": "hash"
}
],
"auto_increment": {
"column": "PayrollDetailInternalID",
"sequence": "payrolldetails_seq"
}
}
}
}
We are inserting the Records like below
use Payrolls;
insert into the TenantList,PayrollMaster,PayrollDetails
We are expecting the Records to be scattered across the Shards but table has zero rows in the Shards.
is there any issue with the way we are trying.
Thanks

It looks like you want payrolls to be sharded. If so, you shouldn't create a shard "0". You should only specify shards "-80" and "80-"

Related

How to define PRIMARY KEY in azure cosmosdb arm template

is it possible to define PRIMARY key in azure cosmosdb cassandra arm template ?
Let's say I have next table:
CREATE TABLE foo
(
id text
name test
PRIMARY KEY (id)
)
And my ARM template:
"schema":{
"columns":[
{
"name":"id",
"type":"text"
}
],
"partitionKeys":[
{"name":"id"} // how to define primary key ?
}
Primary key in Cassandra consists of one or more partition columns, and zero or more clustering columns. In ARM templates they are defined as partitionKeys and clusterKeys arrays of objects. Here is the example from documentation:
"partitionKeys": [
{ "name": "machine" },
{ "name": "cpu" },
{ "name": "mtime" }
],
"clusterKeys": [
{
"name": "loadid",
"orderBy": "asc"
}
]

Azure CosmosDb partition key - different schema

I have an Azure CosmosDB SQP API account with one container "EmployeeContainer", with the partition key "personId". I have three different type of collections in this container. Their schema are as shown below:
Person Collection:
{
"Id": "1234569",
"personId" : "P1241234",
"FirstName": "The first name",
"LarstName": "The last name"
}
Person-Department Collection:
{
"Id": "923456757",
"personId" : "P1241234",
"departmentId": "dept01",
"unitId": "unit01",
"dateOfJoining": "joining date"
}
Department-Employees
{
"id": "678678",
"departmentId" : "dept01",
"departmentName": "IT",
"employees" : [
{ "personId": "P1241234" },
{ "personId": "P1241235" },
{ "personId": "P1241236" },
]
}
How will the data be stored in the logical partitions? PersonId is the partition key and all the collections have personId in it. So, the document in the person collection with the person id "P1241234" and the document in the Person-Department collection with person id "P1241234" will be stored in the same logical partition? How will be the data in the Department-Employees be stored?
This design is not optimal. You should combine Person and Person-Department into a single collection using personId as the partition key, then have a second container for departments that has departmentId as it's partition key with each person as another row and any other properties that you would need when querying that collection. Do not write code that queries both containers. Each should have all the data it needs to satisfy any request you make. Then use change feed to keep them in sync.
You can get more details on how to model this article here
Yes, that is true, documents with the same personId will be stored under the same logical partition (regardless of their type\schema). I'm not sure you can create documents without the partition key on a collection with a partition key, but if you can - all of them should be under the same logical partition (but I dont think you can create them).

Cassandra data modeling for server time series metrics

I gonna collect server metrics, such as OS metrics, Java processes info etc, every second.
For example JSON:
{
"localhost": {
"os": {
"cpu": 4,
"memory": 16
},
"java": {
"jvm": {
"vendor": "Oracle"
},
"heap": 4,
"version": 1.8
}
}
}
What is the best data model for such kind of data?
Should I store every type of metrics in separate table or all in one?
One option would be to translate each individual metric into a dotted string, so that your JSON:
{
"localhost": {
"os": {
"cpu": 4,
"memory": 16
},
"java": {
"jvm": {
"vendor": "Oracle"
},
"heap": 4,
"version": 1.8
}
}
}
Turns into this:
Host Key Value
localhost os.cpu 4
localhost os.memory 16
localhost java.jvm.vendor Oracle
localhost java.heap 4
localhost java.version 1.8
Not shown above is a timestamp column. The primary key would be host+key+timestamp. If you don't need to be able to query by individual host, you could lump the hostname into the key, i.e. key=localhost.os.cpu.
The precise details of your querying needs weigh heavily into whether this is the right choice.

query with multiple values for same attribute in dynamodb nodejs

Is there any way to query a dynamodb table with multiple values for a single attribute?
TableName: "sdfdsgfdg"
IndexName: 'username-category-index',
KeyConditions: {
"username": {
"AttributeValueList": { "S": "aaaaaaa#gmail.com" }
,
"ComparisonOperator": "EQ"
},
"username": {
"AttributeValueList": { "S": "hhhhh#gmail.com" }
,
"ComparisonOperator": "EQ"
},
"category": {
"AttributeValueList": { "S": "Coupon" }
,
"ComparisonOperator": "EQ"
}
}
BachGetItem API can be used to get multiple items from DynamoDB table. However, it can't be used in your use case as you are getting the data from index.
The BatchGetItem operation returns the attributes of one or more items
from one or more tables. You identify requested items by primary key.
In API perspective, there is no other solution. You may need to look at data modelling perspective and design the table/index to satisfy your Query Access Pattern (QAP).
Also, please note that querying the index multiple times with partition key values (i.e. some small number) wouldn't impact the performance as long as it is handful of items.

How to create complex structure in Cassandra with CQL3

I have problem with presenting complex data structure in cassandra.
JSON example of data :
{
"A": {
"A_ID" : "1111"
"field1": "value1",
"field2": "value2",
"field3": [
{
"id": "id1",
"name": "name1",
"segment": [
{
"segment_id": "segment_id_1",
"segment_name": "segment_name_1",
"segment_value": "segment_value_1"
},
{
"segment_id": "segment_id_2",
"segment_name": "segment_name_2",
"segment_value": "segment_value_2"
},
...
]
},
{
"id": "id2",
"name": "name2",
"segment": [
{
"segment_id": "segment_id_3",
"segment_name": "segment_name_3",
"segment_value": "segment_value_3"
},
{
"segment_id": "segment_id_4",
"segment_name": "segment_name_4",
"segment_value": "segment_value_4"
},
...
]
},
...
]
}
}
Will be used only one query:
Find by A_ID.
I think this data should store in one TABLE (Column Family) and without serialization/deserialization operations for more efficiency.
How can I do this if CQL does not support nested maps and lists?
Cassandra 2.1 adds support for nested structures: https://issues.apache.org/jira/browse/CASSANDRA-5590
The downside to "just store it as a json/protobuf/avro/etc blob" is that you have to read-and-rewrite the entire blob to update any field. So at the very least you should pull your top level fields into Cassandra columns, leveraging collections as appropriate.
As you will be using it just as a key/value, you could actually store it either as JSON, or for saving data more efficiently, something like BSON or event Protobuf.
I personally would store it in the Protobuf record, as it doesn't save the field names which may be repeating in your case.

Resources