Problem in counting elements in DynamoDB with Nodejs using scan - node.js

I have a NodeJS function that scan a table in DynamoDB (without primary sort key) and return the number of elements of the column sync that are null.
My table:
var params = {
AttributeDefinitions: [
{
AttributeName: "barname",
AttributeType: "S"
},
{
AttributeName: "timestamp",
AttributeType: "S"
}
],
KeySchema: [
{
AttributeName: "barname",
KeyType: "HASH"
},
{
AttributeName: "timestamp",
KeyType: "RANGE"
}
],
ProvisionedThroughput: {
ReadCapacityUnits: 1,
WriteCapacityUnits: 1
},
TableName: tableName
};
The function that count when sync==false
var dynamodb = new AWS.DynamoDB({apiVersion: '2012-08-10'});
async function getCountNoSync(type){
console.log(type)
var params = {
TableName: tableName,
FilterExpression: 'sync = :sync and billing = :billing',
ExpressionAttributeValues: {
':billing' : {S: type},
':sync' : {BOOL: false}
},
};
var count = 0;
await dynamodb.scan(params).promise()
.then(function(data){
count = data.Count;
})
.catch(function(err) {
count = 0;
console.log(err);
});
return count;
}
The function works fine If a have few elements in my table (eg. less than 150). If the number of elements are higher, the count variable is always 0. It loooks like the scan do not find all elements.
Any ideia?
Best regards

The reason that you do not find all the items where attribute sync == null is that the scan operation is only reading part of your table.
As the documentation states:
If the total number of scanned items exceeds the maximum dataset size limit of 1 MB, the scan stops and results are returned to the user as a LastEvaluatedKey value to continue the scan in a subsequent operation.
So if your table is several hundred of megabytes big, you need to call scan() multiple times and provide the LastEvaluatedKey to read the next "page" of your table. This process is also called "pagination".
But this will take a lot of time and the time this needs will just increase with your table size. The proper way of doing this would be to create an index of the sync field and then do a query() on that index.
You can read more about that in the AWS documentation:
Querying and Scanning a DynamoDB Table
Reference documentation for scan()
Paginating the Results

Related

How to use waitUntilTableNotExists in DynamoDb describe table

I am trying to delete and then create a DynamoDB table using nodejs aws sdk (version 3.142.0) and I wanted to use the waiters (waitUntilTableNotExists / waitUntilTableExists), but I don't understand how they are supposed to be used and I cannot find a good example online.
Regards
Here is one way after a createTable command in aws-sdk-js-v3 to wait for the table to complete. A note is that if you do NOT use waitUntilTableExists and instead attempt to use DescribeTableCommand it will incorrectly report TableStatus == 'ACTIVE' even though you cannot Read/Write to the table, you must use waitUntilTableExists.
import {
CreateTableCommandInput,
CreateTableCommandOutput,
waitUntilTableExists
} from "#aws-sdk/client-dynamodb";
const client = new DynamoDBClient({ region: "us-east-1" });
const data = await client.send(
new CreateTableCommand({
TableName: tableName,
AttributeDefinitions: partitionAndSortKeyDefinitions,
KeySchema: columnSchema,
ProvisionedThroughput: {
ReadCapacityUnits: 4,
WriteCapacityUnits: 2,
},
})
);
// maxWaitTime - seconds
const results = await waitUntilTableExists({client: client, maxWaitTime: 120}, {TableName: tableName})
if (results.state == 'SUCCESS') {
return results.reason.Table
}
console.error(`${results.state} ${results.reason}`);

How to create or update the same DynamoDb item?

I am using the node aws-sdk, I have implemented a method to create or update an item in DynamoDb.
It works well based of off the Key (Id), and will either created or update the item.
My params are as follows:
let params = {
TableName: TableName,
Key: {
key: args.key
},
UpdateExpression: `set
userKey = :userKey,
content = :content`,
ExpressionAttributeValues: {
':userKey': args.userKey,
':content': args.content
},
ExpressionAttributeNames: {
}
};
I have since realised I need to conditionally check a secondary key on the update to ensure the userKey matches.
So I added:
ConditionExpression: 'userKey = :userKey',
Now the create doesn't work as it fails the condition, what is the correct way to do the create and conditional update in one statement?
My table definitionas are as follows:
AttributeDefinitions:
- AttributeName: key
AttributeType: S
- AttributeName: userKey
AttributeType: S
- AttributeName: timestamp
AttributeType: N
KeySchema:
- AttributeName: key
KeyType: HASH
You've got two options-
If you userKey is actually the sort key (https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.CoreComponents.html) of the table, then you can change your params as such:
Key: {
key: args.key
userKey: args.userKey
}
However if userKey is just another attribute, then you can extend the condition expression as such:
ConditionExpression: 'userKey = :userKey OR attribute_not_exists(userKey)'
Which will require either that userKey matches what you were expecting, or that it hasn't been set at all (which would be the case on an upsert).
Note- this would allow you to update an item with a key that did not also have a userKey. If you're concerned about that then you can extend the condition to:
ConditionExpression: 'userKey = :userKey OR(attribute_not_exists(key) AND attribute_not_exists(userKey))'

DynamoDB Table query items using global secondary index

I am trying to query a dynamo table with latitude and longitude for various locations. I want to get the values between certain coordinates as a user pans on the map.
The primary key for the table is city and the sort key is id. I created a global secondary index with lat as the partition key and lon as the sort key (to query for locations between two points in latitude and longitude).
I am trying to use this query:
let doc = require('dynamodb-doc');
let dynamo = new doc.DynamoDB();
...
var params = {
TableName : "locations-dev",
IndexName: "lat-lon-index",
KeyConditionExpression: "lon between :lon2 and :lon1 AND lat between :lat1 and :lat2",
ExpressionAttributeValues: {
":lat1": JSON.stringify(event.bodyJSON.east),
":lat2": JSON.stringify(event.bodyJSON.west),
":lon1": JSON.stringify(event.bodyJSON.north),
":lon2": JSON.stringify(event.bodyJSON.south)
}
};
dynamo.query(params, function (err, data) {
if (err) {
console.error('Error with ', err);
context.fail(err);
} else {
context.succeed(data);
}
});
But I am getting this error:
{
"errorMessage": "Query key condition not supported",
"errorType": "ValidationException",
"stackTrace": [
...
]
}
Here is an example item in Dynamo:
{
"id": "18",
"lat": "39.923070",
"lon": "-86.036178",
"name": "Home Depot",
"phone": "(317)915-8534",
"website": "https://corporate.homedepot.com/newsroom/battery-recycling-one-million-pounds"
}
Primary keys (even in secondary indices) in DynamoDB can only be queried with equals criteria. This constraint is derived from its internal representation since it is stored as hashed value to identify its item partition. Those hashed values cannot be queried by range.
Choosing the Right DynamoDB Partition Key
Except for scan, DynamoDB API operations require an equal operator
(EQ) on the partition key for tables and GSIs. As a result, the
partition key must be something that is easily queried by your
application with a simple lookup (for example, using key=value, which
returns either a unique item or fewer items).

scan\query between two timestamps

I'm writing a nodejs 5.7.1 application with aws-sdk for DynamoDB.
I have a table of events that I created with the following code:
var statsTableName='bingodrive_statistics';
var eventNameColumn = 'event_name';
var eventTimeColumn = 'event_time';
var eventDataColumn = 'event_data';
var params = {
TableName: statsTableName,
KeySchema: [ // The type of of schema. Must start with a HASH type, with an optional second RANGE.
{ // Required HASH type attribute
AttributeName: eventNameColumn,
KeyType: 'HASH',
},
{ // Optional RANGE key type for HASH + RANGE tables
AttributeName: eventTimeColumn,
KeyType: 'RANGE',
}
],
AttributeDefinitions: [ // The names and types of all primary and index key attributes only
{
AttributeName: eventNameColumn,
AttributeType: 'S', // (S | N | B) for string, number, binary
},
{
AttributeName: eventTimeColumn,
AttributeType: 'N'
}
],
ProvisionedThroughput: { // required provisioned throughput for the table
ReadCapacityUnits: 1,
WriteCapacityUnits: 1,
}
};
dynamodbClient.createTable(params, callback);
as you can see, I have a Hash + Range index. the range is on event_time.
now I want to scan or query for all the items between two specific dates.
so i'm sending the following params to the query function of dynamoDb:
{
"TableName": "bingodrive_statistics",
"KeyConditionExpression": "event_time BETWEEN :from_time and :to_time",
"ExpressionAttributeValues": {
":from_time": 1457275538691,
":to_time": 1457279138691
}
and i'm getting this error:
{
"message": "Query condition missed key schema element",
"code": "ValidationException",
"time": "2016-03-06T15:46:06.862Z",
"requestId": "5a672003-850c-47c7-b9df-7cd57e7bc7fc",
"statusCode": 400,
"retryable": false,
"retryDelay": 0
}
I'm new to dynamoDb. I don't know what's the best method, Scan or Query in my case. any information regarding the issue would be greatly appreciated.
You should use query. You can't use only range key if you want to query for values between two range keys, you need to use hash key as well since range key. It's because hash key (partition key) is used to select a physical partition where the data is stored, sorted by range key (sort key). From DynamoDB developer guide:
If the table has a composite primary key (partition key and sort key), DynamoDB calculates the hash value of the partition key in the same way as described in Data Distribution: Partition Key—but it stores all of the items with the same partition key value physically close together, ordered by sort key value.
Also, you should choose partition key that distributes well your data. If evenName has small total number of values, it might not be the best option (See Guidelines For Tables]
That said, if you already have eventName as your hash key and eventTime as your range Key, you should query (sorry for pseudo code, I use DynamoDBMapper normally):
hashKey = name_of_your_event
conditions = BETWEEN
attribute_values (eventTime1, eventTime2)
You don't need additional Local Secondary Index or Global Secondary Index for that. Note that GSI let's you query for columns that are not indexed with the table hash and range key, but to query data between the timestamps, you will still need a range key or will need to do a Scan otherwise.
Use this query
function getConversationByDate(req , cb) {
var payload = req.all; //05/09/2017
var params = {
TableName: "message",
IndexName: "thread_id-timestamp-index",
KeyConditionExpression: "#mid = :mid AND #time BETWEEN :sdate AND :edate",
ExpressionAttributeNames: {
"#mid": "thread_id",
"#time": "timestamp"
},
ExpressionAttributeValues: {
":mid": payload.thread_id,
":sdate": payload.startdate,
":edate": payload.enddate
}
};
req.dynamo.query(params, function (err, data) {
cb(err, data);
});
}

Dynamodb query not working

I want perform query on multiple columns in dynamodb
I want to search on 4 fields
rule0 which is unique hence hash key,
which_rule is Global Index,
rule1 is Global Index,
rule2 is Global Index,
I searching for rule0 first then rule1 then rule2
I am using following function to match rule1 cause match not found with rule0
function dynamoQueryRule1(rule0, rule1, funcQ) {
which_rule = "rule1";
var q = {
"KeyConditions": {
"rule0": {
"ComparisonOperator": "EQ",
"AttributeValueList": [{
"S": rule0
}]
}
},
"TableName": "tbl_scripts" + table_prefix,
"AttributesToGet": ['id'],
"ConditionalOperator": "AND",
"QueryFilter": {
"which_rule": {
"ComparisonOperator": "EQ",
"AttributeValueList": [{
"S": which_rule
}]
},
"rule1": {
"ComparisonOperator": "EQ",
"AttributeValueList": [{
"S": rule1
}]
}
}
};
var params = {
TableName: this.options.tbl_name
};
But this function is never gets results for rule1. Even I have already created those rules in dymamo table.
To check dynamo data, I have checked with scan its working fine but scan having 1mb limitation and not perfect usecase for my problem.
Please help me to find bug in query format.
Check if your response object has [LastEvaluatedKey] != null.
If the [LastEvaluatedKey] is set then it means you did not query the entire dataset yet (again because of the 1M limit). So you would need to call query again setting the ExclusiveStartKey in the query = the LastEvaluatedKey from the previous response. (appending all the results together)
This similar question shows the syntax for how to do this in node.js:
Recursive Fetch All Items In DynamoDB Query using Node JS

Resources