I'm trying to delete some items from DynamoDB table. My table has a global secondary index. And I'm wondering if it's possible to use the batchWrite method of the DocumentClient to delete items from GSI table. Or we can use GSI for fetching data only?
var params = {
RequestItems: {
'Table-1': [
{
DeleteRequest: {
Key: { HashKey: 'someKey' }
}
}
]
}
};
documentClient.batchWrite(params, function(err, data) {
if (err) console.log(err);
else console.log(data);
});
if it's possible please provide some example of params.
docs
You cannot delete from a GSI. These indexes are pretty much read only: you can’t mutate the data in the table through a global secondary index, so no inserting, deleting or updating.
You can only read from the GSI and then implement the necessary logic to delete the items in the main table by key.
Also, a batch operation doesn’t make it a whole lot more efficient to delete those items: yes, it saves on network calls (up to 25:1) but not on used write capacity.
Related
I have 4.5 millions of records in my Dynamo Db.
I want to read the the id of each record as a batchwise.
i am expecting something like offset and limit like how we can read in Mongo Db.
Is there any way suggestions without scan method in Node-JS.
I have done enough research i can only find scan method which buffers the complete records from Dynamo Db and the it starts scanning the records, which is not effective in performance basis.
Please do give me suggestion.
From my point of view, there's no problem doing scans because (according to the Scan doc):
DynamoDB paginates the results from Scan operations
You can use the ProjectionExpression parameter so that Scan only returns some of the attributes, rather than all of them
The default size for pages is 1MB, but you can also specify the max number of items per page with the Limit parameter.
So it's just basic pagination, the same thing MongoDB does with offset and limit.
Here is an example from the docs of how to perform Scan with the node.js SDK.
Now, if you want to get all the IDs as a batchwise, you could wrap the whole thing with a Promise and resolve when there's no LastEvaluatedKey.
Below a pseudo-code of what you could do :
const performScan = () => new Promise((resolve, reject) => {
const docClient = new AWS.DynamoDB.DocumentClient();
let params = {
TableName:"YOUR_TABLE_NAME",
ProjectionExpression: "id",
Limit: 100 // only if you want something else that the default 1MB. 100 means 100 items
};
let items = [];
var scanExecute = cb => {
docClient.scan(params, (err,result) => {
if(err) return reject(err);
items = items.concat(result.Items);
if(result.LastEvaluatedKey) {
params.ExclusiveStartKey = result.LastEvaluatedKey;
return scanExecute();
} else {
return err
? reject(err)
: resolve(items);
}
});
};
scanExecute();
});
performScan().then(items => {
// deal with it
});
First things to know about DynamoDB is that it is a Key-Value Store with support for secondary indexes.
DynamoDB is a bad choice if the application often has to iterate over the entire data set without using indexes(primary or secondary), because the only way to do that is to use the Scan API.
DynamoDB Table Scan's are (a few things I can think off)
Expensive(I mean $$$)
Slow for big data sets
Might use up the provisioned throughput
If you know the primary key of all the items in DynamoDB (some external knowledge like primary is an auto incremented value, is referenced in another DB etc) then you can use BatchGetItem or Query.
So if it is a one off thing then Scan is your only option else you should look into refactoring your application to remove this scenario.
I am trying to query dynamodb using the following code:
const AWS = require('aws-sdk');
let dynamo = new AWS.DynamoDB.DocumentClient({
service: new AWS.DynamoDB(
{
apiVersion: "2012-08-10",
region: "us-east-1"
}),
convertEmptyValues: true
});
dynamo.query({
TableName: "Jobs",
KeyConditionExpression: 'sstatus = :st',
ExpressionAttributeValues: {
':st': 'processing'
}
}, (err, resp) => {
console.log(err, resp);
});
When I run this, I get an error saying:
ValidationException: Query condition missed key schema element: id
I do not understand this. I have defined id as the partition key for the jobs table and need to find all the jobs that are in processing status.
You're trying to run a query using a condition that does not include the primary key. This is how queries work in DynamoDB. You would need to do a scan for the info in your case, however, I don't think that is the best option.
I think you want to set up a global secondary index and use that to query for the processing status.
In another answer #smcstewart responded to this question. But he provides a link instead of commenting why this error occurs. I want to add a brief comment hoping it will save your time.
AWS docs on Querying a Table states that you can do WHERE condition queries (e.g. SQL query SELECT * FROM Music WHERE Artist='No One You Know') in the DynamoDB way, but with one important caveat:
You MUST specify an EQUALITY condition for the PARTITION key, and you can optionally provide another condition for the SORT key.
Meaning you can only use key attributes with Query. Doing it in any other way would mean that DynamoDB would run a full scan for you which is NOT efficient - less efficient than using Global secondary indexes.
So if you need to query on non-key attributes using Query is usually NOT an option - best option is using Global Secondary Indexes as suggested by #smcstewart.
I found this guide to be useful to create a Global secondary index manually.
If you need to add it using CloudFormation here is a relevant page.
I was getting this error for a different scenario. Here is my scenario.
(It's very unlikely that anyone else ends up with this case, but incase)
I had a query working on a Table (say table A). Table A had a partition key m_id and sort key u_id.
I had a query to fetch data using m_id. The query was working.
'''
var queryParams = {
ExpressionAttributeValues: {
':m_id': mId
},
KeyConditionExpression: 'm_id = :m_id',
TableName: "A"
};
let connections = await docClient.query(queryParams).promise();
'''
I created another Table say Table B. I made some errors in naming keys so I simply deleted and created a table with the same name again, Table B. Table B had partition key m_id, and sort key s_id.
I copied pasted the same query which I was using for Table A, I changed Table name only because partition key had the same name.
To my shock, I get this expectation.
"ValidationException: Query condition missed key schema element"
I rechecked all the names, I compared the query with the working query. Everything was fine.
I thought maybe because, I was deleting recreating Table B, it could be something with that. So I create a fresh Table with a new Name Table B2 with the same key names as Table B.
In my query that was throwing exceptions, I changed only the Table name from B to B2.
And the Exception was gone.
If you are getting this on a fresh table, where no query has worked earlier, creating a new Table with a new name is an option.
If you delete a Table only to change partition key names, it may be safer to use a new name for Table as well (Dynamo could be referring metadata by table names and not by internal identifiers, it is possible that old metadata stays even if you delete a table. Just a guess given I faced this case).
EDIT:2022-July-12
This error does not leave me. My own answer was helpful but one more case, there was a trailing space in name of Key in the table. And Dynamo does not even check for spaces in key names.
You have to create an global secondary index for the status field.
Then, you code could look like smth like this:
dynamo.query({
TableName: "Jobs",
IndexName: 'status',
KeyConditionExpression: '#s = :st',
ExpressionAttributeValues: {
':st': 'processing'
},
ExpressionAttributeNames: {
'#s': 'status',
},
}, (err, resp) => {
console.log(err, resp);
});
Note: scan operation is indeed very costly, especially if you table is huge in size
i solved the problem using AWS.DynamoDB.DocumentClient() with scan, for sample (nodejs):
var docClient = new AWS.DynamoDB.DocumentClient();
var params = {
TableName: "product",
FilterExpression: "#cg = :data",
ExpressionAttributeNames: {
"#cg": "categoria",
},
ExpressionAttributeValues: {
":data": category,
}
};
docClient.scan(params, onScan);
function onScan(err, data) {
if (err) {
// for the log in server
console.error("Unable to scan the table. Error JSON:", JSON.stringify(err, null, 2));
res.json(err);
} else {
console.log("Scan succeeded.");
res.json(data);
}
}
I am using AWS.DynamoDB.DocumentClient in a nodejs program to fetch items from multiple Dynamodb tables. To make code simple, I choose to use BatchGetItem/BatchGet method.
The challenge is I need to fetch items based on a Global Secondary Index, e.g. name+age, rather than the initial primary key generated when creating the table. I went through BatchGetItem/BatchGet but not see any parameters of using Global Secondary Index.
I ran some testing with the following code
var params = {
RequestItems: {
'Table-1': {
Keys: [
{
name: 'abc',
age: 18,
},
]
}
}
};
var docClient = new AWS.DynamoDB.DocumentClient();
docClient.batchGet(params, function(err, data) {
if (err) console.log(err);
else console.log(data);
});
And got following error.
> ValidationException: The provided key element does not match the
> schema
Does it mean BatchGetItem/BatchGet can't use Global Secondary Index, and I have to read from tables one by one?
I don't believe so. You will likely have to query one-by-one.
INDEXES - The response includes the aggregate ConsumedCapacity for the operation, together with ConsumedCapacity for each table and secondary index that was accessed. Note that some operations, such as GetItem and BatchGetItem , do not access any indexes at all. In these cases, specifying INDEXES will only return ConsumedCapacity information for table(s).
Source: https://docs.aws.amazon.com/cli/latest/reference/dynamodb/batch-get-item.html
How to batchGet global secundary index in DynamoDB?
These params gives me a schema error because this hash key is only in index table, main has other.
const params = {
RequestItems: {
"MyTableName": {
Keys: [
{
"ThisHashKeyIsOnlyInIndexTable": value
}
]
}
}
};
docClient.batchGet(params, (err, data) => {
// ...
}
Docs doesn't even mention how to batchGet only from index(es).
Unfortunately, the GetItem and BatchGetItem, can not access any indexes. You can't pass IndexName on params similar to Query API.
Highlighted the point relevant to the question.
ReturnConsumedCapacity — (String) Determines the level of detail about
provisioned throughput consumption that is returned in the response:
INDEXES - The response includes the aggregate ConsumedCapacity for the
operation, together with ConsumedCapacity for each table and secondary
index that was accessed. Note that some operations, such as GetItem
and BatchGetItem, do not access any indexes at all. In these cases,
specifying INDEXES will only return ConsumedCapacity information for
table(s).
TOTAL - The response includes only the aggregate ConsumedCapacity for
the operation. NONE - No ConsumedCapacity details are included in the
response.
I am trying to determine the existence of an object to decide whether to create a new object with a new key or to update an existing object. The goal here is to match on two Secondary Indexes.
db.query(bucket, {end: null, definition_id: id}, function(err, data) {
if (err) {
res.send(err);
} else {
if (data.length === 0) {
// write new obj
} else {
// add to current obj
}
}
});
If there is an easy way to do this with the HTTP API I would be game for that, too, just can't seem to find it in the docs.
Thanks.
Riak's secondary indexing doesn't support querying 2 indexes simultaneously, you would need to query each index separately and then intersect the result sets.
However, if you need to routinely query the same pair of indexes, you can create a composite index in addition to the others. So if you are indexing, end and definition_id, also create a end-def index whose values are the end and definition_id concatenated with a separator.