I have a item such as:
"Item": {
"model": 32590038899,
"date": 10-09-2015,
"price":100
}
"hash_key_attribute_name": "model"
"range_key_attribute_name": "date"
My problem is that new items gets inserted a lot so there is chance that items with same model number may come and they may not be same product.This may due to regions where the product is available.So i need a setup where i need to keep a copy of the product, if such a case arises and in future after inspection or requirement i can bring back the item. I am looking for a kind of version system. Currently the product gets deleted due to same primary key.
Put simply: if your primary key isn't unique, then you aren't using the correct field as your primary key.
...there is chance that items with same model number may come and they
may not be same product.This may due to regions where the product is
available...
It sounds like your primary key needs to be a composite of region and product ID. Or perhaps you need a separate table for each region.
I would add a unique id attribute to your item, a uuid works well, such as
{ id: "53f382ae-94b8-4910-b8f1-384f46dc10d8",
model: 32590038899,
date: "10-09-2015",
price:100
}
Change your table schema to have just a hash key attribute name - id.
Add a global secondary index with hash key - model, range key - date.
Global secondary indexes can contain items where the primary keys collide. With this schema you will prevent items from being overwritten and you can query for items with a given model number.
Related
I am developing an app and I am using AWS DynamoDB as the database server. DB is connected with nodeJS. I have a table called chat to store messages. In chat table every item have connectionid as partition key. An item looks like this
{"id": "123456", "messages": [{"id": "1214545","message":"ksadfksfmsfsdf","sender":"112415",
"timestamp": "27:1:2022:11:53"}]}
This is a single Item and there are more inside the chat table.
I have been trying to delete a message inside the item. Its easy for me to delete a whole item because connectionid is the partition key. But I want to delete it with respect to id which is nested. How to achieve this?
But I want to delete with respect to id which is nested. How to to achieve this?
No, you cannot. Its not possible to have nested primary key or secondary index on nested key.
I would recommend to remodel your table schema, you can use a composite key for example:
keep message array as same, just take id out of the nesting.
you can have "id": "123456" as sort key and "id": "1214545" as primary key.
a combination of both will give you the ability to find the unique record and delete it.
if you want to delete all messages ( multiple messages), you can delete it using query operation with key = "id": "1214545" which is your primary key.
Or a very simple solution would be to have message id as your single primary key if it is going to be unique for all record.
Note:- in composite key primary key can have same value but sort key should be different all the time.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.CoreComponents.html#HowItWorks.CoreComponents.PrimaryKey
I am trying to setup a new Cosmos DB and it's asking me to set a partition key. I think I understand the concept where I should select a JSON field that can group my documents efficiently.
Is it possible to configure the collection to use a JSON field that may not exist in every incoming document?
For example:
{
"name" : "Robin",
"DOB" : "01/01/1969",
"scans" : {
"bloodType" : "O"
}
}
{
"name" : "Bill",
"DOB" : "01/01/1969"
}
Can I use /scans.bloodType as the partion key? For documents that don't have a scans JSON field, I still want that data as I can update that document later.
You can, indeed, specify a partition key that might not exist in every document. When you save a document that's missing the property specified by your partition key, it will result in assigning an "undefined" value for its partition key.
In the future, if you wanted to provide a value for the partition key of such a document, you'd have to delete, and then re-add, the document. You cannot modify a property's value when it happens to be the partition key within that document's container (nor can you add the property to an existing document that doesn't explicitly have that property defined, since it's already been assigned the "undefined" value).
See this answer for more details on undefined partition key properties.
Unfortunately you can't do that.
As per the official docs the partition key path (/scans.bloodType) or the partition key value cannot be changed.
Be a property that has a value which does not change. If a property is your partition key, you can't update that property's value.
In terms of solutions, you could either try and find another partition Key property path and ensure there's a value at the time of creation, or maybe use a secondary collection to store your incomplete documents and use the change feed to "move" them to the final collection once all the data becomes available.
Something like blood type wouldn't be a good partition key, assuming it refers to A/B/O types, etc, if there are only a small number of possible values. What you can generally always fall back on is the unique id property of the item, which Cosmos creates automatically. It's fine to have one item per partition, if nothing else is a better candidate. Per docs:
If your container has a property that has a wide range of possible
values, it is likely a great partition key choice. One possible
example of such a property is the item ID. For small read-heavy
containers or write-heavy containers of any size, the item ID is
naturally a great choice for the partition key.
I am working on Azure and I have created a database with Cosmos DB SQL API. After creating a container I see that the primary key is always named "id". Is there any way to create a container with PK with name different than "id"?
Every document has a unique id called id. This cannot be altered. If you don't set a value, it is assigned a GUID.
When using methods such as ReadDocument() (or equivalent, based on the SDK) for direct reads instead of queries, a document's id property must be specified.
Now, as far as partitioning goes, you can choose any property you want to use as the partition key.
And if you have additional domain-specific identifiers (maybe a part number), you can always store those in their own properties as well. In this example though, just remember that, while you can query for documents containing a specific part number, you can only do a direct-read via id. If direct reads will be the majority of your read operations, then it's worth considering the use of id to store such a value.
PK = primary key, by the way ;) It's the "main" key for the record, it is not private :)
And no, as far as I know you cannot change the primary key name. It is always "id". Specifically it must be in lowercase, "Id" is not accepted.
I have a table called Media, where the primary key is the mediaId. I have an additional table called Media_Comments which has a commentId as the primary key, and a mediaId attribute that stores the mediaId that that comment is linked to. Same with Media_Likes, I have a primary key of mediaId and sort key of userId. I want to handle a case where a Media item is deleted by a user, which will then cause a mass deletion of all comments and likes of that Media item. I am currently writing this code in a lambda using Node.js.
I tried using a regular delete based on the condition of where 'mediaId = :mediaId', but it was complaining about needing a primary key for the table. Unfortunately, many times when I want a media item deleted I won't have specific key items available to fulfill that condition. I looked into trying to delete by a certain index, like setting a GSI on the mediaId in each table and deleting by that, unfortunately that does not seem to be an option either.
Basically, am I missing something? Is there actually a way to delete by an index? And if not, what would be the best way to do this? Setting a TTL for each item in dynamodb that is affiliated with the Media item? Or is there another recommended way to handle this problem?
Any help is greatly appreciated, thank you.
With Cassandra,
I want to represent all users objects with a unique uuid, but also contain a set of zero or more secondary user keys to map to a user. Each secondary key should map to one and only one user(id). Because I need to be able to quick lookup of secondarykey to find a user, I maintain a separate lookup table, instead of a secondary INDEX.
I've modelled the data like this, but I am open to alternatives:
CREATE TABLE users (
userid uuid PRIMARY KEY,
name text,
secondarykeys set<text>
);
CREATE TABLE user_secondarykeys (
secondarykey text,
userid uuid,
PRIMARY KEY(secondarykey)
);
A typical use case is this:
I got this user with a secondary key mail:andreas#example.org, and I would like to see if there exists any user with that secondary key, and if it do not exists, I would like to create a new user object.
I can look for the secondary key:
SELECT * FROM "user_secondarykeys" WHERE secondarykey = "mail:andreas#example.org";
and if I do not find any matches, I can insert a new user:
BEGIN BATCH
INSERT INTO users (userid, name, secondarykeys) VALUES (77059e45-5fac-460b-9c4f-47528c292be0, "Andreas", {'mail:andreas#example.org'});
INSERT INTO user_secondarykeys (secondarykey, userid) VALUES ('mail:andreas#example.org', 77059e45-5fac-460b-9c4f-47528c292be0);
APPLY BATCH;
My problem is that this can lead to inconsistent data, because a user can be inserted with that secondary key in the meantime between my select and my inserts.
I'm thinking that if I can make my INSERT transaction fail if the secondary key already exists in user_secondarykeys, that would work, because it should then also revert the insert into the users table, because of the atomic property of the transaction. However, I do not know any ways to make the INSERT fail if the secondary key exists. If I add IF NOT EXISTS to the second insert, it will not revert the trasaction it will just avoid inserting into user_secondarykeys, but it will still insert into users.
Any suggestions on how to implement this use case in a reliable way is appreciated. Thanks.
At first, I think that your model is pretty complicated, and I'm not sure if I understand correctly all of your requirements.
So if you get at first this secondary key, and then you have to decide what to do - add user or not - then the following will work for you:
Instead of checking user_secondarykeys table with SELECT statement for occurrence of particular secondary key, go with the following:
INSERT INTO user_secondarykeys (secondarykey, userid) VALUES ('mail:andreas#example.org', 77059e45-5fac-460b-9c4f-47528c292be0) IF NOT EXISTS;
So if it applies, it means that this secondary key is not connected with any user - so there are two cases: user doesn't exists or user exists and someone want's to add new secondary key for him. The following will do the job in both cases:
INSERT INTO users(userid, name, secondarykeys) VALUES(77059e45-5fac-460b-9c4f-47528c292be0, 'Andreas', secondarykeys = secondarykeys + 'mail:andreas#example.org')
Because inserts/updates in Cassandra are idempotent(except counters), this will work even if there will be already an user with that id in users table - this should just add another secondary key for him.
Pros of this solution are that you will remove this gap in time which can make you 'inconsistent'. You have a guarantee that no one will insert two users with the same secondary key. You specified that user can have no secondary keys at all - in this situation you can add him straight to the users table.
I'm thinking that if I can make my INSERT transaction fail if the secondary key already exists in user_secondarykeys, that would work, because it should then also revert the insert into the users table, because of the atomic property of the transaction. However, I do not know any ways to make the INSERT fail if the secondary key exists. If I add IF NOT EXISTS to the second insert, it will not revert the trasaction it will just avoid inserting into user_secondarykeys, but it will still insert into users.
Since Cassandra 2.0.6 you can use a conditional statements inside a batch, and if any of conditions will be not met then all instructions in that batch won't fire. This sounds great but there is a limitation - all of the statements inside batch have to operate on the single, same partition. According to this, it is impossible to make cross partition/table conditional insert/update/delete. So in your case this:
BEGIN BATCH
INSERT INTO users (userid, name, secondarykeys) VALUES (77059e45-5fac-460b-9c4f-47528c292be0, "Andreas", {'mail:andreas#example.org'});
INSERT INTO user_secondarykeys (secondarykey, userid) VALUES ('mail:andreas#example.org', 77059e45-5fac-460b-9c4f-47528c292be0) IF NOT EXISTS;
APPLY BATCH;
would not even pass the query validation, because you try here to operate on two different tables.
I'm not sure if this will be suitable for other of your requirements, I would need more information about your queries and the velocity/volume of the data. For sure there are other ways for modeling this.
It would greatly simplify the problem if every user would have to have at least one specified secondary key(e.g. email would be a great unique key for your users table), but that's are your requirements, so unless you can't change them there is no discussion.
Hope this will help you a bit.
Good luck!