Mass delete items in DynamoDB - node.js

I have a table called Media, where the primary key is the mediaId. I have an additional table called Media_Comments which has a commentId as the primary key, and a mediaId attribute that stores the mediaId that that comment is linked to. Same with Media_Likes, I have a primary key of mediaId and sort key of userId. I want to handle a case where a Media item is deleted by a user, which will then cause a mass deletion of all comments and likes of that Media item. I am currently writing this code in a lambda using Node.js.
I tried using a regular delete based on the condition of where 'mediaId = :mediaId', but it was complaining about needing a primary key for the table. Unfortunately, many times when I want a media item deleted I won't have specific key items available to fulfill that condition. I looked into trying to delete by a certain index, like setting a GSI on the mediaId in each table and deleting by that, unfortunately that does not seem to be an option either.
Basically, am I missing something? Is there actually a way to delete by an index? And if not, what would be the best way to do this? Setting a TTL for each item in dynamodb that is affiliated with the Media item? Or is there another recommended way to handle this problem?
Any help is greatly appreciated, thank you.

Related

Name the primary key of Container in Cosmbos DB Sql Api different than "id"

I am working on Azure and I have created a database with Cosmos DB SQL API. After creating a container I see that the primary key is always named "id". Is there any way to create a container with PK with name different than "id"?
Every document has a unique id called id. This cannot be altered. If you don't set a value, it is assigned a GUID.
When using methods such as ReadDocument() (or equivalent, based on the SDK) for direct reads instead of queries, a document's id property must be specified.
Now, as far as partitioning goes, you can choose any property you want to use as the partition key.
And if you have additional domain-specific identifiers (maybe a part number), you can always store those in their own properties as well. In this example though, just remember that, while you can query for documents containing a specific part number, you can only do a direct-read via id. If direct reads will be the majority of your read operations, then it's worth considering the use of id to store such a value.
PK = primary key, by the way ;) It's the "main" key for the record, it is not private :)
And no, as far as I know you cannot change the primary key name. It is always "id". Specifically it must be in lowercase, "Id" is not accepted.

Find the history of Deleted Data in QLDB

I have created Vehicle Table In the Ledger and added some vehicles in QLDB and I deleted the vehicle data.Now I am not able to fetch the metadata id because user table and committed table will have only non-deleted latest version of application data.So I am not able to fetch History of that deleted Data through metadata.Kindly help me with PartiQL query to fetch the History, if there is a way.
Note: I don't have vehicle registration table which stores metadataId of vehicles.
The way you are doing it is correct. First, you filter on history by some known attribute (in this case, a user defined primary key such as 'VIN') and you retrieve the document id. After that, you can filter history using that document id.
The second query should return the same as the first but it will also contain the deletion information (the first query will not include it because the deletion removes the attribute).
Note that the document id is returned as part of the DELETE PartiQL statement.

RethinkDB: How do I create a custom duplicate check on insert

I want to bulk insert an array of data using NodeJS and RethinkDB but I don't want to insert existing records (where name & value already has a record, I don't want to dupcheck on primary key id).
[
{name:"Robert", value:"1337"},
{name:"Martin", value:"0"},
{name:"Oskar", value:"1"}
]
If any of the above values already exist, don't insert, but update "value".
My current working solution is that I loop through the array and first check if it exists using a filter, if not, i insert it. But it's very slow on 10.000 records.
I don't think we have that kind of concept in RethinkDB. I tried to read the doc more. To insert a new document, use insert, to update field, use update, to replace to a whole new document, use replace(the primary key won't change)...So I don't think it's possible in RethinkDB.
Here is some way you can make it run faster:
Create a compound index contains those two fields: name and value
Then using that index to check for existence instead of using filter
Generate your own id field, instead of letting RethinkDB generated it. Therefore, you know the primary key, and use it to look up document with get which will be very fast.
I had a similar requirement in a RethinkDB project, but in that case the primary key was being checked for duplicates, and it was also custom instead of being auto-generated.
What you could do is run an async.series or async.waterfall two-step check. First pick a single object from your array, then filter the database for the name-value pairs of your current object. If the results come up null, it is unique. If not, you have a pre-existing record with same details.
Depending on the result, you can then pass on the control to next step which will either insert the new document or update existing one. It will be simpler if you use a flag for this in async.waterfall.

Data modelling for consistent secondary keys with Cassandra

With Cassandra,
I want to represent all users objects with a unique uuid, but also contain a set of zero or more secondary user keys to map to a user. Each secondary key should map to one and only one user(id). Because I need to be able to quick lookup of secondarykey to find a user, I maintain a separate lookup table, instead of a secondary INDEX.
I've modelled the data like this, but I am open to alternatives:
CREATE TABLE users (
userid uuid PRIMARY KEY,
name text,
secondarykeys set<text>
);
CREATE TABLE user_secondarykeys (
secondarykey text,
userid uuid,
PRIMARY KEY(secondarykey)
);
A typical use case is this:
I got this user with a secondary key mail:andreas#example.org, and I would like to see if there exists any user with that secondary key, and if it do not exists, I would like to create a new user object.
I can look for the secondary key:
SELECT * FROM "user_secondarykeys" WHERE secondarykey = "mail:andreas#example.org";
and if I do not find any matches, I can insert a new user:
BEGIN BATCH
INSERT INTO users (userid, name, secondarykeys) VALUES (77059e45-5fac-460b-9c4f-47528c292be0, "Andreas", {'mail:andreas#example.org'});
INSERT INTO user_secondarykeys (secondarykey, userid) VALUES ('mail:andreas#example.org', 77059e45-5fac-460b-9c4f-47528c292be0);
APPLY BATCH;
My problem is that this can lead to inconsistent data, because a user can be inserted with that secondary key in the meantime between my select and my inserts.
I'm thinking that if I can make my INSERT transaction fail if the secondary key already exists in user_secondarykeys, that would work, because it should then also revert the insert into the users table, because of the atomic property of the transaction. However, I do not know any ways to make the INSERT fail if the secondary key exists. If I add IF NOT EXISTS to the second insert, it will not revert the trasaction it will just avoid inserting into user_secondarykeys, but it will still insert into users.
Any suggestions on how to implement this use case in a reliable way is appreciated. Thanks.
At first, I think that your model is pretty complicated, and I'm not sure if I understand correctly all of your requirements.
So if you get at first this secondary key, and then you have to decide what to do - add user or not - then the following will work for you:
Instead of checking user_secondarykeys table with SELECT statement for occurrence of particular secondary key, go with the following:
INSERT INTO user_secondarykeys (secondarykey, userid) VALUES ('mail:andreas#example.org', 77059e45-5fac-460b-9c4f-47528c292be0) IF NOT EXISTS;
So if it applies, it means that this secondary key is not connected with any user - so there are two cases: user doesn't exists or user exists and someone want's to add new secondary key for him. The following will do the job in both cases:
INSERT INTO users(userid, name, secondarykeys) VALUES(77059e45-5fac-460b-9c4f-47528c292be0, 'Andreas', secondarykeys = secondarykeys + 'mail:andreas#example.org')
Because inserts/updates in Cassandra are idempotent(except counters), this will work even if there will be already an user with that id in users table - this should just add another secondary key for him.
Pros of this solution are that you will remove this gap in time which can make you 'inconsistent'. You have a guarantee that no one will insert two users with the same secondary key. You specified that user can have no secondary keys at all - in this situation you can add him straight to the users table.
I'm thinking that if I can make my INSERT transaction fail if the secondary key already exists in user_secondarykeys, that would work, because it should then also revert the insert into the users table, because of the atomic property of the transaction. However, I do not know any ways to make the INSERT fail if the secondary key exists. If I add IF NOT EXISTS to the second insert, it will not revert the trasaction it will just avoid inserting into user_secondarykeys, but it will still insert into users.
Since Cassandra 2.0.6 you can use a conditional statements inside a batch, and if any of conditions will be not met then all instructions in that batch won't fire. This sounds great but there is a limitation - all of the statements inside batch have to operate on the single, same partition. According to this, it is impossible to make cross partition/table conditional insert/update/delete. So in your case this:
BEGIN BATCH
INSERT INTO users (userid, name, secondarykeys) VALUES (77059e45-5fac-460b-9c4f-47528c292be0, "Andreas", {'mail:andreas#example.org'});
INSERT INTO user_secondarykeys (secondarykey, userid) VALUES ('mail:andreas#example.org', 77059e45-5fac-460b-9c4f-47528c292be0) IF NOT EXISTS;
APPLY BATCH;
would not even pass the query validation, because you try here to operate on two different tables.
I'm not sure if this will be suitable for other of your requirements, I would need more information about your queries and the velocity/volume of the data. For sure there are other ways for modeling this.
It would greatly simplify the problem if every user would have to have at least one specified secondary key(e.g. email would be a great unique key for your users table), but that's are your requirements, so unless you can't change them there is no discussion.
Hope this will help you a bit.
Good luck!

RestKit/CoreData not updating - creating duplicates

I have an ios 5 app which does not create any data - it simply makes a GET call to a REST webservice and populates the sqlite database with those records. The initial GET works great when there are no records in the local database. However when I make subsequent calls, I will only be returning a subset of records whose data has changed since the last GET. But what is happening is that the records are just being added again, not updating the existing records.
I have an ID field which is the primary key (or should be) and when a record comes in whose ID already exists, I want that data to be updated. If that ID does not exist, it should be an insert.
I didn't see a way to set my ID field as a 'primary key' in the datamodel in XCode. I tried doing this in my didFinishLaunchingWIthOptions method:
userMapping.primaryKeyAttribute = #"id";
But that alone didn't really seem to do anything.
This is my call to actually perform the GET:
// Load the object model via RestKit
[objectManager loadObjectsAtResourcePath:[#"/synchContacts" appendQueryParams:params] delegate:self];
Which seems to do everything automagically. I am lost at this point as to where I should be putting logic to check to see if the ID exists, and if so do an update vs an insert, or what.
As of the latest RESTKit version (0.23) you can define the primary key like this:
[_mapping addAttributeMappingsFromDictionary:#{ #"id" : #"objectId", #"name" : #"name" }];
[_mapping setIdentificationAttributes:#[ #"objectId" ]];
Whereas objectId is you primary key on the core data object.
You seem to be doing it correctly and when your didLoadObjects callback happens you should be able to query Core Data for the objects you need.
You might be having an issue with the way your fetch requests are being set up. With the latest RestKit you can use RKObjectMappingProvider's
- (void)setObjectMapping:(RKObjectMappingDefinition *)objectMapping forResourcePathPattern:(NSString *)resourcePathPattern withFetchRequestBlock:(RKObjectMappingProviderFetchRequestBlock)fetchRequestBlock;
function and have the fetchRequestBlock fetch the proper data.
RestKit doesn't really handle partial update requests very well out of the box though. You might have more luck on the RestKit google group which is very active.
Quote:
I didn't see a way to set my ID field as a 'primary key' in the datamodel in XCode. I tried doing this in my didFinishLaunchingWIthOptions method:
userMapping.primaryKeyAttribute = #"id";
Keep in mind, the 'primaryKeyAttribute' is the one from your api payload, NOT a CoreData id, which CoreData manages on its own. RestKIt then maps the (invisible) CoreData primary key to the specified JSON key.

Resources