AWS DynamoDB Node.js scan- certain number of results - node.js

I try to get first 10 items which satisfy condition from DynamoDB using lambda AWS. I was trying to use Limit parameter but it is (basis on that website)
https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/DynamoDB.html#scan-property
"maximum number of items to evaluate (not necessarily the number of matching items)".
How to get 10 first items which satisfy my condition?
var AWS = require('aws-sdk');
var db = new AWS.DynamoDB();
exports.handler = function(event, context) {
var params = {
TableName: "Events", //"StreamsLambdaTable",
ProjectionExpression: "ID, description, endDate, imagePath, locationLat, locationLon, #nm, startDate, #tp, userLimit", //specifies the attributes you want in the scan result.
FilterExpression: "locationLon between :lower_lon and :higher_lon and locationLat between :lower_lat and :higher_lat",
ExpressionAttributeNames: {
"#nm": "name",
"#tp": "type",
},
ExpressionAttributeValues: {
":lower_lon": {"N": event.low_lon},
":higher_lon": {"N": event.high_lon}, //event.high_lon}
":lower_lat": {"N": event.low_lat},
":higher_lat": {"N": event.high_lat}
}
};
db.scan(params, function(err, data) {
if (err) {
console.log(err); // an error occurred
}
else {
data.Items.forEach(function(record) {
console.log(
record.name.S + "");
});
context.succeed(data.Items);
}
});
};

I think you already know the reason behind this: the distinction that DynamoDB makes between ScannedCount and Count. As per this,
ScannedCount — the number of items that were queried or scanned,
before any filter expression was applied to the results.
Count — the
number of items that were returned in the response.
The fix for that is documented right above this:
For either a Query or Scan operation, DynamoDB might return a LastEvaluatedKey value if the operation did not return all matching items in the table. To get the full count of items that match, take the LastEvaluatedKey value from the previous request and use it as the ExclusiveStartKey value in the next request. Repeat this until DynamoDB no longer returns a LastEvaluatedKey value.
So, the answer to your question is: use the LastEvaluatedKey from DynamoDB response and Scan again.

Related

How can I filter entries in dynamodb which has time_stamp more than 1 day?

I have a lambda function which queries dynamoDb table userDetailTable, and I want to filter only the entries whose timestamp(recorded in ms) has exceeded 1 day (86400000 ms) when subtracted from (new Date.getTime()). Can anyone suggest me the way for doing it in the right way ?
Dynamo Table has GSIndex as user_status which has value 'active' for all the entries and epoch_timestamp(timestamp in ms) as attribute used for filter expression.
In Lambda I am checking epoch_timestamp and trying to subtract epoch_timestamp with (new Date.getTime()) in the query, which I am not sure is even possible. Below is the code which has my query.
function getUserDetails(callback){
var params = {
TableName: 'userDetailTable',
IndexName: 'user_status-index',
KeyConditionExpression: 'user_status = :user_status',
FilterExpression: `expiration_time - ${new Date().getTime()} > :time_difference`,
ExpressionAttributeValues: {
':user_status': 'active',
':time_difference': '86400000' // 1 day in ms
}
};
docClient.query(params, function(err, data) {
if(err) {
callback(err, null)
} else{
callback(null, data)
}
})
}
Here's a rewrite of your code:
function getUserDetails(callback){
var params = {
TableName: 'userDetailTable',
IndexName: 'user_status-index',
KeyConditionExpression: 'user_status = :user_status',
FilterExpression: 'epoch_timestamp > :time_threshold_ms',
ExpressionAttributeValues: {
':user_status': 'active',
':time_threshold_ms': Date.now() - 86400000
}
};
docClient.query(params, function(err, data) {
if(err) {
callback(err, null)
} else{
callback(null, data)
}
})
}
Specifically, in the FilteExpression you cannot compute any date. Instead you should compare the item's epoch_timestamp attribute with time_threshold_ms which you compute once (for all items inspected by the query) at ExpressionAttributeValues
Please note though that you are can make this more efficient if you define a GSI which uses epoch_timestamp as its sort key (user_status can remain the partition key). Then, instead of placing the condition in the FilterExpression you will need to move it into KeyConditionExpression.
Also, when you use a FilterExpression you need to check the LastEvaluatedKey of the response. If it is not empty you need to issue a followup query with LastEvaluatedKey copied into the request's ExclusiveStartKey. Why? due to filtering it is possible that you will get no results from the "chunk" (or "page") examined by DDB. DDB only examines a single "chunck" at each query invocation. Issuing a followup query with ExclusiveStartKey will tell DDB to inspect the next "chunk".
(see https://dzone.com/articles/query-dynamodb-items-withnodejs for further details on that)
Alternatively, if you do not use filtering you are advised to use pass a Limit value in the request to tell DDB to stop after the desired number of items. However, if you do use filtering do not pass a Limit value as it will reduce the size of the "chunk" and you will need to do many more followup queries until you get your data.
You cannot perform a calculation in the filter expression but you can calculate it outside and use the result with a new inequality.
I think you are looking for items expiring after one day from now.
Something like
FilterExpression: 'expiration_time > :max_time',
ExpressionAttributeValues: {
':user_status': 'active',
':max_time': new Date().getTime() + 86400000 // 1 day in ms // i.e. one day from now.
}

DynamoDB Scan with FilterExpression in nodejs

I'm trying to retrieve all items from a DynamoDB table that match a FilterExpression, and although all of the items are scanned and half do match, the expected items aren't returned.
I have the following in an AWS Lambda function running on Node.js 6.10:
var AWS = require("aws-sdk"),
documentClient = new AWS.DynamoDB.DocumentClient();
function fetchQuotes(category) {
let params = {
"TableName": "quotient-quotes",
"FilterExpression": "category = :cat",
"ExpressionAttributeValues": {":cat": {"S": category}}
};
console.log(`params=${JSON.stringify(params)}`);
documentClient.scan(params, function(err, data) {
if (err) {
console.error(JSON.stringify(err));
} else {
console.log(JSON.stringify(data));
}
});
}
There are 10 items in the table, one of which is:
{
"category": "ChuckNorris",
"quote": "Chuck Norris does not sleep. He waits.",
"uuid": "844a0af7-71e9-41b0-9ca7-d090bb71fdb8"
}
When testing with category "ChuckNorris", the log shows:
params={"TableName":"quotient-quotes","FilterExpression":"category = :cat","ExpressionAttributeValues":{":cat":{"S":"ChuckNorris"}}}
{"Items":[],"Count":0,"ScannedCount":10}
The scan call returns all 10 items when I only specify TableName:
params={"TableName":"quotient-quotes"}
{"Items":[<snip>,{"category":"ChuckNorris","uuid":"844a0af7-71e9-41b0-9ca7-d090bb71fdb8","CamelCase":"thevalue","quote":"Chuck Norris does not sleep. He waits."},<snip>],"Count":10,"ScannedCount":10}
You do not need to specify the type ("S") in your ExpressionAttributeValues because you are using the DynamoDB DocumentClient. Per the documentation:
The document client simplifies working with items in Amazon DynamoDB by abstracting away the notion of attribute values. This abstraction annotates native JavaScript types supplied as input parameters, as well as converts annotated response data to native JavaScript types.
It's only when you're using the raw DynamoDB object via new AWS.DynamoDB() that you need to specify the attribute types (i.e., the simple objects keyed on "S", "N", and so on).
With DocumentClient, you should be able to use params like this:
const params = {
TableName: 'quotient-quotes',
FilterExpression: '#cat = :cat',
ExpressionAttributeNames: {
'#cat': 'category',
},
ExpressionAttributeValues: {
':cat': category,
},
};
Note that I also moved the field name into an ExpressionAttributeNames value just for consistency and safety. It's a good practice because certain field names may break your requests if you do not.
I was looking for a solution that combined KeyConditionExpression with FilterExpression and eventually I worked this out.
Where aws is the uuid. Id is an assigned unique number preceded with the text 'form' so I can tell I have form data, optinSite is so I can find enquiries from a particular site. Other data is stored, this is all I need to get the packet.
Maybe this can be of help to you:
let optinSite = 'https://theDomainIWantedTFilterFor.com/';
let aws = 'eu-west-4:EXAMPLE-aaa1-4bd8-9ean-1768882l1f90';
let item = {
TableName: 'Table',
KeyConditionExpression: "aws = :Aw and begins_with(Id, :form)",
FilterExpression: "optinSite = :Os",
ExpressionAttributeValues: {
":Aw" : { S: aws },
":form" : { S: 'form' },
":Os" : { S: optinSite }
}
};

Retrieving rows from dynamoDB from Lambda based on primary key?

Below is the code of my Lambda function. I'm having trouble querying rows based on the timestamps. My plan is to get all the rows from 5 seconds before the current time to the current time in milliseconds. TimeMillis(Number) stores the current time in miiliseconds and it is the primary key and the range key is PhoneId(String). Please help me with the solution or is there any way to overcome the problem?
I'm not able to get the output, it is throwing error.
'use strict';
var AWS = require("aws-sdk");
AWS.config.update({
region: "us-east-1",
});
var docClient = new AWS.DynamoDB.DocumentClient();
exports.handler = function(event, context, callback) {
var timemillis = new Date().getTime();
var timemillis1 = timemillis - 5000;
var params = {
TableName: 'Readings',
KeyConditionExpression: "TimeMillis = :tm and TimeMillis BETWEEN :from AND :to",
ExpressionAttributeValues: {
":tm" : "TimMillis",
":from" : timemillis1,
":to" : timemillis
}
};
docClient.query(params, function(err, data) {
if(err){
callback(err, null);
}
else{
callback(null, data);
}
});
};
Here is my DynamoDB table image.
You cannot have multiple conditions inside a KeyConditionExpression. What you can do is use a FilterExpression with KeyConditionExpression to narrow down the result set.
Quoting from the documentation,
Use the KeyConditionExpression parameter to provide a specific value
for the partition key. The Query operation will return all of the
items from the table or index with that partition key value. You can
optionally narrow the scope of the Query operation by specifying a
sort key value and a comparison operator in KeyConditionExpression. To
further refine the Query results, you can optionally provide a
FilterExpression. A FilterExpression determines which items within the
results should be returned to you. All of the other results are
discarded.
Also for the test, only supported test for partition key is equality. Other conditions can be applied to sort key.
partitionKeyName = :partitionkeyval AND sortKeyName = :sortkeyval
Another way is to create a GSI which support further querying. By the way, traditional RDBMS thinking would not work best with DynamoDB. You can read about best practices here.

How to scan between date range using Lambda and DynamoDB?

I'm attempting to scan between a date range using a Node Lambda function. I have the data being scanned correctly, but I can't seem to get the date expression to work correctly.
var AWS = require('aws-sdk');
var dynamodb = new AWS.DynamoDB({apiVersion: '2012-08-10'});
exports.handler = function(event, context) {
var tableName = "MyDDBTable";
dynamodb.scan({
TableName : tableName,
FilterExpression: "start_date < :start_date",
ExpressionAttributeValues: {
":start_date": {
"S": "2016-12-01"
}
}
}, function(err, data) {
context.succeed(data);
});
};
This currently doesn't try to return between a range, it's just looking at a single date right now. I didn't want to add an and to the expression until I knew this was working.
A sample document in my DynamoDB is structured like so:
{
"end_date": {
"S": "2016-12-02"
},
"name": {
"S": "Name of document"
},
"start_date": {
"S": "2016-10-10"
},
"document_id": {
"N": "7"
}
}
The document_id is my primary key. I'm pretty new to this whole Lamdba / DynamoDB combination, so I may have this completely set up wrong, but this is what I've managed to complete through my research.
What I'm ultimately trying to achieve is given a start date and an end date, return all DynamoDB documents that have a date range within that. Any help would be greatly appreciated.
Firstly, the scan operation is correct. The dynamodb.scan should be executed in a loop until LastEvaluatedKey is not available. Please refer this blog.
The lambda is not returning the result because it would have not found the data in the first scan. If you can extend the scan until LastEvaluatedKey is not available, the lambda is likely to return the result.
For Query and Scan operations, DynamoDB calculates the amount of
consumed provisioned throughput based on item size, not on the amount
of data that is returned to an application.
If you query or scan for specific attributes that match values that
amount to more than 1 MB of data, you'll need to perform another Query
or Scan request for the next 1 MB of data. To do this, take the
LastEvaluatedKey value from the previous request, and use that value
as the ExclusiveStartKey in the next request. This approach will let
you progressively query or scan for new data in 1 MB increments.
BETWEEN Operator sample:-
FilterExpression: "start_date BETWEEN :date1 and :date2"

How can I do DynamoDB limit after filtering?

I would like to implement a DynamoDB Scan with the following logic:
Scanning -> Filtering(boolean true or false) -> Limiting(for pagination)
However, I have only been able to implement a Scan with this logic:
Scanning -> Limiting(for pagination) -> Filtering(boolean true or false)
How can I achieve this?
Below is an example I have written that implements the second Scan logic:
var parameters = {
TableName: this.tableName,
Limit: queryStatement.limit
};
if ('role' in queryStatement) {
parameters.FilterExpression = '#role = :role';
parameters.ExpressionAttributeNames = {
'#role': 'role'
};
parameters.ExpressionAttributeValues = {
':role': queryStatement.role
};
}
if ('startKey' in queryStatement) {
parameters.ExclusiveStartKey = { id: queryStatement.startKey};
}
this.documentClient.scan(parameters, (errorResult, result) => {
if (errorResult) {
errorResult._status = 500;
return reject(errorResult);
}
return resolve(result);
});
This codes works like second one.
Scanning -> Limiting -> Filtering
The DynamoDB LIMIT works as mentioned below (i.e. second approach in your post) by design. As it works by design, there is no solution for this.
LastEvaluatedKey should be used to get the data on subsequent scans.
Scanning -> Limiting(for pagination) -> Filtering(boolean true or false)
In a request, set the Limit parameter to the number of items that you
want DynamoDB to process before returning results.
In a response, DynamoDB returns all the matching results within the
scope of the Limit value. For example, if you issue a Query or a Scan
request with a Limit value of 6 and without a filter expression,
DynamoDB returns the first six items in the table that match the
specified key conditions in the request (or just the first six items
in the case of a Scan with no filter). If you also supply a
FilterExpression value, DynamoDB will return the items in the first
six that also match the filter requirements (the number of results
returned will be less than or equal to 6).
For either a Query or Scan operation, DynamoDB might return a
LastEvaluatedKey value if the operation did not return all matching
items in the table. To get the full count of items that match, take
the LastEvaluatedKey value from the previous request and use it as the
ExclusiveStartKey value in the next request. Repeat this until
DynamoDB no longer returns a LastEvaluatedKey value.
Use --max-items=2 instead of --limit=2, max-items will do limit after filtering.
Sample query with max-items:
aws dynamodb query --table-name=limitTest --key-condition-expression="gsikey=:hash AND gsisort>=:sort" --expression-attribute-values '{ ":hash":{"S":"1"}, ":sort":{"S":"1"}, ":levels":{"N":"10"}}' --filter-expression="levels >= :levels" --scan-index-forward --max-items=2 --projection-expression "levels,key1" --index-name=gsikey-gsisort-index
Sample query with limit:
aws dynamodb query --table-name=limitTest --key-condition-expression="gsikey=:hash AND gsisort>=:sort" --expression-attribute-values '{ ":hash":{"S":"1"}, ":sort":{"S":"1"}, ":levels":{"N":"10"}}' --filter-expression="levels >= :levels" --scan-index-forward --limit=2 --projection-expression "levels,key1" --index-name=gsikey-gsisort-index
If there is just one field that is of interest for the pagination you could create an index with that field as a key. Then you do not need to recurse for the number of items in the limit.
limts add now in dynamodb
var params = {
TableName: "message",
IndexName: "thread_id-timestamp-index",
KeyConditionExpression: "#mid = :mid",
ExpressionAttributeNames: {
"#mid": "thread_id"
},
ExpressionAttributeValues: {
":mid": payload.thread_id
},
Limit: (3 , 2 ,3),
LastEvaluatedKey: 1,
ScanIndexForward: false
};
req.dynamo.query(params, function (err, data) {
console.log(err, data);
})
Limit defines the number of items/records evaluated using only the KeyCondition(If Present) before applying any filters. To solve this problem, as pointed out in earlier solutions, one approach could be to use a GSI, where the filtering condition is part of your GSI key. However, this is fairly restrictive as it's not practical to introduce a new GSI for every access pattern which requires pagination. A more realistic approach could be to query track the Count in the query response and keep querying & appending the next page of results until the aggregated Count satisfies the client-defined Limit. Keep in mind, that you would need to have some custom logic to build the LastEvaluatedKey, which would be required to fetch the subsequent result-page. In Go, this can be achieved in the following way.
func PaginateWithFilters(ctx context.Context, keyCondition string, filteringCondition int, cursor *Cursor) ([]*Records, error) {
var collectiveResult []map[string]types.AttributeValue
var records []*Records
expr, err := buildFilterQueryExpression(keyCondition, filteringCondition)
if err != nil {
return nil, err
}
queryInput := &dynamodb.QueryInput{
ExpressionAttributeNames: expr.Names(),
ExpressionAttributeValues: expr.Values(),
KeyConditionExpression: expr.KeyCondition(),
FilterExpression: expr.Filter(),
TableName: aws.String(yourTableName),
Limit: aws.Int32(cursor.PageLimit),
}
if cursor.LastEvaluatedKey != nil {
queryInput.ExclusiveStartKey = cursor.LastEvaluatedKey
}
paginator := dynamodb.NewQueryPaginator(dbClient, queryInput)
for {
if !paginator.HasMorePages() {
fmt.Println("no more records in the partition")
cursor.LastEvaluatedKey = nil
break
}
singlePage, err := paginator.NextPage(ctx)
if err != nil {
return nil, err
}
pendingItems := int(cursor.PageLimit) - len(collectiveResult)
if int(singlePage.Count) >= pendingItems {
collectiveResult = append(collectiveResult, singlePage.Items[:pendingItems]...)
cursor.LastEvaluatedKey = buildExclusiveStartKey(singlePage.Items[pendingItems-1])
break
}
collectiveResult = append(collectiveResult, singlePage.Items...)
}
err = attributevalue.UnmarshalListOfMaps(collectiveResult, &records)
if err != nil {
return nil, err
}
return records, nil
}
This Medium article discusses pagination with DynamoDB in a bit more depth and includes code snippets in Go to paginate query responses with filters.

Resources