is it possible to define PRIMARY key in azure cosmosdb cassandra arm template ?
Let's say I have next table:
CREATE TABLE foo
(
id text
name test
PRIMARY KEY (id)
)
And my ARM template:
"schema":{
"columns":[
{
"name":"id",
"type":"text"
}
],
"partitionKeys":[
{"name":"id"} // how to define primary key ?
}
Primary key in Cassandra consists of one or more partition columns, and zero or more clustering columns. In ARM templates they are defined as partitionKeys and clusterKeys arrays of objects. Here is the example from documentation:
"partitionKeys": [
{ "name": "machine" },
{ "name": "cpu" },
{ "name": "mtime" }
],
"clusterKeys": [
{
"name": "loadid",
"orderBy": "asc"
}
]
Related
I have an Azure CosmosDB SQP API account with one container "EmployeeContainer", with the partition key "personId". I have three different type of collections in this container. Their schema are as shown below:
Person Collection:
{
"Id": "1234569",
"personId" : "P1241234",
"FirstName": "The first name",
"LarstName": "The last name"
}
Person-Department Collection:
{
"Id": "923456757",
"personId" : "P1241234",
"departmentId": "dept01",
"unitId": "unit01",
"dateOfJoining": "joining date"
}
Department-Employees
{
"id": "678678",
"departmentId" : "dept01",
"departmentName": "IT",
"employees" : [
{ "personId": "P1241234" },
{ "personId": "P1241235" },
{ "personId": "P1241236" },
]
}
How will the data be stored in the logical partitions? PersonId is the partition key and all the collections have personId in it. So, the document in the person collection with the person id "P1241234" and the document in the Person-Department collection with person id "P1241234" will be stored in the same logical partition? How will be the data in the Department-Employees be stored?
This design is not optimal. You should combine Person and Person-Department into a single collection using personId as the partition key, then have a second container for departments that has departmentId as it's partition key with each person as another row and any other properties that you would need when querying that collection. Do not write code that queries both containers. Each should have all the data it needs to satisfy any request you make. Then use change feed to keep them in sync.
You can get more details on how to model this article here
Yes, that is true, documents with the same personId will be stored under the same logical partition (regardless of their type\schema). I'm not sure you can create documents without the partition key on a collection with a partition key, but if you can - all of them should be under the same logical partition (but I dont think you can create them).
I am new to the Vitess and we have deployed Vitess in Kubernetes using Helm Charts and exposed VTGate through node Port and able to connect MYSQL Work bench using the exposed VTGate Ip Address.
When we inserted the records using MySql Work Bench, Records are not scattering across the Shards.
* 'Payrolls' is the KeySpace and we created two shards as Payrolls -80,Payrolls 80-.
* Table Schema as below
1. TenantList
(
TenantID INT,
NAme VARCHAR(200)
)
2. PayrollMaster
(
PayrollID INT PRIMARY KEY,
HHAID INT FOREIGN KEY TO TenantList(TenantID)
)
3. PayrollDetails
(
PayrollDetailID INT PRIMARY KEY,
PayrollID INT FOREIGN KEY TO PayrollMaster(PayrollID),
HHAID INT FOREIGN KEY TO TenantList(TenantID)
)
* VSchema is as below
{
"sharded": true,
"vindexes": {
"hash": {
"type": "hash"
}
},
"tables": {
"payrollmaster": {
"column_vindexes":
[
{
"column": "HHA",
"name": "hash"
}
],
"auto_increment": {
"column": "PayrollInternalID",
"sequence": "payrollmaster_seq"
}
},
"payrolldetails": {
"column_vindexes":
[
{
"column": "HHA",
"name": "hash"
}
],
"auto_increment": {
"column": "PayrollDetailInternalID",
"sequence": "payrolldetails_seq"
}
}
}
}
We are inserting the Records like below
use Payrolls;
insert into the TenantList,PayrollMaster,PayrollDetails
We are expecting the Records to be scattered across the Shards but table has zero rows in the Shards.
is there any issue with the way we are trying.
Thanks
It looks like you want payrolls to be sharded. If so, you shouldn't create a shard "0". You should only specify shards "-80" and "80-"
I use the Azure Search indexer to index documents from a MongoDB CosmosDB which contains objects with fields named _id.
As Azure Search does not allow underscores at the beginning of a field name in the index, I want to create a field mapping.
JSON structure in Cosmos --> structure in index
{
"id": "test"
"name": "test",
"productLine": {
"_id": "123", --> "id": "123"
"name": "test"
}
}
The documentation has exactly this scenario as an example but only for a top level field.
"fieldMappings" : [ { "sourceFieldName" : "_id", "targetFieldName" : "id" } ]}
I tried the following:
"fieldMappings" : [ { "sourceFieldName" : "productLine/_id", "targetFieldName" : "productLine/id" } ] }
that results in an error stating:
Value is not accepted. Valid values: "doc_id", "name", "productName".
What is the correct way to create a mapping for a target field that is a subfield?
It's not possible to directly map subfields. You can get around this by adding a Skillset with a Shaper cognitive skill to the indexer, and an output field mapping.
You will also want to attach a Cognitive Services resource to the skillset. The shaper skill doesn't get billed, but attaching a Cognitive Services resource allows you to process more than 20 documents per day.
Shaper skill
{
"#odata.type": "#Microsoft.Skills.Util.ShaperSkill",
"context": "/document",
"inputs": [
{
"name": "id",
"source": "/document/productLine/_id"
},
{
"name": "name",
"source": "/document/productLine/name"
}
],
"outputs": [
{
"name": "output",
"targetName": "renamedProductLine"
}
]
}
Indexer skillset and output field mapping
"skillsetName": <skillsetName>,
"outputFieldMappings": [
{
"sourceFieldName": "/document/renamedProductLine",
"targetFieldName": "productLine"
}
]
Is there any way to query a dynamodb table with multiple values for a single attribute?
TableName: "sdfdsgfdg"
IndexName: 'username-category-index',
KeyConditions: {
"username": {
"AttributeValueList": { "S": "aaaaaaa#gmail.com" }
,
"ComparisonOperator": "EQ"
},
"username": {
"AttributeValueList": { "S": "hhhhh#gmail.com" }
,
"ComparisonOperator": "EQ"
},
"category": {
"AttributeValueList": { "S": "Coupon" }
,
"ComparisonOperator": "EQ"
}
}
BachGetItem API can be used to get multiple items from DynamoDB table. However, it can't be used in your use case as you are getting the data from index.
The BatchGetItem operation returns the attributes of one or more items
from one or more tables. You identify requested items by primary key.
In API perspective, there is no other solution. You may need to look at data modelling perspective and design the table/index to satisfy your Query Access Pattern (QAP).
Also, please note that querying the index multiple times with partition key values (i.e. some small number) wouldn't impact the performance as long as it is handful of items.
I have problem with presenting complex data structure in cassandra.
JSON example of data :
{
"A": {
"A_ID" : "1111"
"field1": "value1",
"field2": "value2",
"field3": [
{
"id": "id1",
"name": "name1",
"segment": [
{
"segment_id": "segment_id_1",
"segment_name": "segment_name_1",
"segment_value": "segment_value_1"
},
{
"segment_id": "segment_id_2",
"segment_name": "segment_name_2",
"segment_value": "segment_value_2"
},
...
]
},
{
"id": "id2",
"name": "name2",
"segment": [
{
"segment_id": "segment_id_3",
"segment_name": "segment_name_3",
"segment_value": "segment_value_3"
},
{
"segment_id": "segment_id_4",
"segment_name": "segment_name_4",
"segment_value": "segment_value_4"
},
...
]
},
...
]
}
}
Will be used only one query:
Find by A_ID.
I think this data should store in one TABLE (Column Family) and without serialization/deserialization operations for more efficiency.
How can I do this if CQL does not support nested maps and lists?
Cassandra 2.1 adds support for nested structures: https://issues.apache.org/jira/browse/CASSANDRA-5590
The downside to "just store it as a json/protobuf/avro/etc blob" is that you have to read-and-rewrite the entire blob to update any field. So at the very least you should pull your top level fields into Cassandra columns, leveraging collections as appropriate.
As you will be using it just as a key/value, you could actually store it either as JSON, or for saving data more efficiently, something like BSON or event Protobuf.
I personally would store it in the Protobuf record, as it doesn't save the field names which may be repeating in your case.