I am trying to figure out how I could perform atomic updates on an item where the source data contains mapped values with the keys of those maps being dynamic.
If you look at the sample data below, I am trying to figure out how I could do atomic updates of the values in BSSentDestIp and BSRecvDestIp over the same item. I was reading the documentation but the only thing I could find was list_append, which would leave me with a list of appended keys/values that I would need to traverse and sum later.
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Expressions.UpdateExpressions.html
Example of input data:
{
"RecordId": 31,
"UUID": "170ae748-f8cf-4df9-6e08-c0c8a5f029d4",
"UserId": "username",
"DeviceId": "e0:cb:4e:53:ae:ff",
"ExpireTime": 1501445446,
"StartTime": 1501441846,
"EndTime": 1501441856,
"MinuteId": 10,
"PacketCount": 1028,
"ByteSum": 834111,
"BSSent": 98035,
"BSRecv": 736076,
"BSSentDestIp": {
"151.101.129.69": 2518,
"192.168.1.254": 4780,
"192.168.1.80": 14089,
"192.33.31.162": 2386,
"54.239.30.232": 21815,
"54.239.31.129": 6423,
"54.239.31.69": 3255,
"54.239.31.83": 18447,
"98.138.253.109": 3020
},
"BSRecvDestIp": {
"151.101.129.69": 42414,
"151.101.57.174": 20792,
"192.230.66.108": 130175,
"192.33.31.162": 56398,
"23.194.140.100": 26209,
"54.239.26.209": 57210,
"54.239.31.129": 188747,
"54.239.31.69": 41115,
"98.138.253.109": 111775
}
}
NodeJS function executed via Lambda to update Dynamo:
function updateItem(UserIdValue, MinuteIdValue) {
var UpdateExpressionString = "set PacketCount = PacketCount + :PacketCount, \
ByteSum = ByteSum + :ByteSum, \
BSSent = BSSent + :BSSent, \
BSRecv = BSRecv + :BSRecv";
var params = {
TableName: gDynamoTable,
Key: {
"UserId": UserIdValue,
"MinuteId": MinuteIdValue
},
UpdateExpression: UpdateExpressionString,
ExpressionAttributeValues: {
":PacketCount": gRecordObject.PacketCount,
":ByteSum": gRecordObject.ByteSum,
":BSSent": gRecordObject.BSSent,
":BSRecv": gRecordObject.BSRecv
},
ReturnValues: "UPDATED_NEW"
};
dynamo.updateItem(params, function(err, data) {
if (err) {
console.log("updateItem Error: " + err);
} else {
console.log("updateItem Success: " + JSON.stringify(data));
}
});
}
Updating a singe item is atomic in DynamoDB if you read an item, and call PutItem it is guaranteed to be atomic. It either update all fields or update none of them.
Now the only issue that I see with that is that you can have write conflicts. Say if one process reads an item, update one map, while another process in parallel does the same thing it will result in one PutItem overwriting recent update and you can lose data.
To solve this issue you can use conditional updates. In a nutshell it allows you to update an item only if a specified condition is met. What you can do is to maintain a version number with every item. When you update an item you can increment a version attribute and, when you write an item, check that version number is the one you expect. Otherwise you need to read the item again (somebody updated it while you were working with it) perform your update again and try to write again.
Related
I try to update an item in dynamodb by adding a condition, without passing the key in the parameters.
And as soon as my condition is true update. Is it possible to do this?
Below an example of an item:
{
"id" : "bcc2f32e-305e-4469-88e2-463724b5c6a9",
"name" : "toto",
"email" : "toto#titi.com"
}
Where email is unique for items.
I tested this code and it works :
const name= "updateName";
const params = {
TableName: MY_TABLE,
Key: {
id
},
UpdateExpression: 'set #name = :name',
ExpressionAttributeNames: { '#name': 'name' },
ExpressionAttributeValues: { ':name': name },
ReturnValues: "ALL_NEW"
}
dynamoDb.update(params, (error, result) => {
if (error) {
res.status(400).json({ error: 'Could not update Item' });
}
res.json(result.Attributes);
})
But i want to do something like this (replace the Key by conditionExpression):
const params = {
TableName: MY_TABLE,
UpdateExpression: 'set #name = :name',
ConditionExpression: '#email = :email',
ExpressionAttributeNames: {
'#name': 'name',
'#email': 'email'
},
ExpressionAttributeValues: {
':name': name,
':email': email
},
ReturnValues: "ALL_NEW"
}
dynamoDb.update(params, (error, result) => {
if (error) {
res.status(400).json({ error: 'Could not update User' });
}
res.json(result.Attributes);
})
But this code doesn't work.
Any ideas?
You cannot update an item in DynamoDB without using the entire primary key (partition key, and sort key if present). This is because you must specify exactly one record for the update. See the documentation here.
If you want to find an item using a field that is not the primary key, then you can search using a scan (potentially slow and expensive) or by using a Global Secondary Index (GSI) on that field. Either of these methods requires that you do a separate request to find the item in question, and then use its primary key to perform the update.
It sounds like you want to do an update that waits for a condition. That's not how DynamoDb works; it cannot wait for anything (except consistency, I suppose, but that's somewhat different). What you can do is make a request with a condition, and if it fails the condition (returning immediately), make the request again later. If you do this you'll need to be careful to backoff appropriately, or you might end up making a lot of requests very quickly.
The key is a required parameter when doing updates; the condition expression can be used in addition to providing the key, but can't be used instead of the key.
Also, I am not sure you fully understand what the conditionExpression is for - its not like the 'where' clause in an SQL update statement (i.e. update mytable set name='test' where email='myemail.com'.
Instead, logically the conditionExpression in an update would be more like:
update mytable set name='test' where key='12345' but only if quantity >0 - for example,
i.e. you are telling dynamodb the exact key of the record you want updated, and once it finds it it uses the condition expression to determine if the update should proceed - i.e. find the record with id=12345, and change the name to 'test', only of the quantity is greater than 0.
It does not use the conditionExpression to find records to update.
I'm trying to retrieve all items from a DynamoDB table that match a FilterExpression, and although all of the items are scanned and half do match, the expected items aren't returned.
I have the following in an AWS Lambda function running on Node.js 6.10:
var AWS = require("aws-sdk"),
documentClient = new AWS.DynamoDB.DocumentClient();
function fetchQuotes(category) {
let params = {
"TableName": "quotient-quotes",
"FilterExpression": "category = :cat",
"ExpressionAttributeValues": {":cat": {"S": category}}
};
console.log(`params=${JSON.stringify(params)}`);
documentClient.scan(params, function(err, data) {
if (err) {
console.error(JSON.stringify(err));
} else {
console.log(JSON.stringify(data));
}
});
}
There are 10 items in the table, one of which is:
{
"category": "ChuckNorris",
"quote": "Chuck Norris does not sleep. He waits.",
"uuid": "844a0af7-71e9-41b0-9ca7-d090bb71fdb8"
}
When testing with category "ChuckNorris", the log shows:
params={"TableName":"quotient-quotes","FilterExpression":"category = :cat","ExpressionAttributeValues":{":cat":{"S":"ChuckNorris"}}}
{"Items":[],"Count":0,"ScannedCount":10}
The scan call returns all 10 items when I only specify TableName:
params={"TableName":"quotient-quotes"}
{"Items":[<snip>,{"category":"ChuckNorris","uuid":"844a0af7-71e9-41b0-9ca7-d090bb71fdb8","CamelCase":"thevalue","quote":"Chuck Norris does not sleep. He waits."},<snip>],"Count":10,"ScannedCount":10}
You do not need to specify the type ("S") in your ExpressionAttributeValues because you are using the DynamoDB DocumentClient. Per the documentation:
The document client simplifies working with items in Amazon DynamoDB by abstracting away the notion of attribute values. This abstraction annotates native JavaScript types supplied as input parameters, as well as converts annotated response data to native JavaScript types.
It's only when you're using the raw DynamoDB object via new AWS.DynamoDB() that you need to specify the attribute types (i.e., the simple objects keyed on "S", "N", and so on).
With DocumentClient, you should be able to use params like this:
const params = {
TableName: 'quotient-quotes',
FilterExpression: '#cat = :cat',
ExpressionAttributeNames: {
'#cat': 'category',
},
ExpressionAttributeValues: {
':cat': category,
},
};
Note that I also moved the field name into an ExpressionAttributeNames value just for consistency and safety. It's a good practice because certain field names may break your requests if you do not.
I was looking for a solution that combined KeyConditionExpression with FilterExpression and eventually I worked this out.
Where aws is the uuid. Id is an assigned unique number preceded with the text 'form' so I can tell I have form data, optinSite is so I can find enquiries from a particular site. Other data is stored, this is all I need to get the packet.
Maybe this can be of help to you:
let optinSite = 'https://theDomainIWantedTFilterFor.com/';
let aws = 'eu-west-4:EXAMPLE-aaa1-4bd8-9ean-1768882l1f90';
let item = {
TableName: 'Table',
KeyConditionExpression: "aws = :Aw and begins_with(Id, :form)",
FilterExpression: "optinSite = :Os",
ExpressionAttributeValues: {
":Aw" : { S: aws },
":form" : { S: 'form' },
":Os" : { S: optinSite }
}
};
I'm attempting to scan between a date range using a Node Lambda function. I have the data being scanned correctly, but I can't seem to get the date expression to work correctly.
var AWS = require('aws-sdk');
var dynamodb = new AWS.DynamoDB({apiVersion: '2012-08-10'});
exports.handler = function(event, context) {
var tableName = "MyDDBTable";
dynamodb.scan({
TableName : tableName,
FilterExpression: "start_date < :start_date",
ExpressionAttributeValues: {
":start_date": {
"S": "2016-12-01"
}
}
}, function(err, data) {
context.succeed(data);
});
};
This currently doesn't try to return between a range, it's just looking at a single date right now. I didn't want to add an and to the expression until I knew this was working.
A sample document in my DynamoDB is structured like so:
{
"end_date": {
"S": "2016-12-02"
},
"name": {
"S": "Name of document"
},
"start_date": {
"S": "2016-10-10"
},
"document_id": {
"N": "7"
}
}
The document_id is my primary key. I'm pretty new to this whole Lamdba / DynamoDB combination, so I may have this completely set up wrong, but this is what I've managed to complete through my research.
What I'm ultimately trying to achieve is given a start date and an end date, return all DynamoDB documents that have a date range within that. Any help would be greatly appreciated.
Firstly, the scan operation is correct. The dynamodb.scan should be executed in a loop until LastEvaluatedKey is not available. Please refer this blog.
The lambda is not returning the result because it would have not found the data in the first scan. If you can extend the scan until LastEvaluatedKey is not available, the lambda is likely to return the result.
For Query and Scan operations, DynamoDB calculates the amount of
consumed provisioned throughput based on item size, not on the amount
of data that is returned to an application.
If you query or scan for specific attributes that match values that
amount to more than 1 MB of data, you'll need to perform another Query
or Scan request for the next 1 MB of data. To do this, take the
LastEvaluatedKey value from the previous request, and use that value
as the ExclusiveStartKey in the next request. This approach will let
you progressively query or scan for new data in 1 MB increments.
BETWEEN Operator sample:-
FilterExpression: "start_date BETWEEN :date1 and :date2"
I try to get first 10 items which satisfy condition from DynamoDB using lambda AWS. I was trying to use Limit parameter but it is (basis on that website)
https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/DynamoDB.html#scan-property
"maximum number of items to evaluate (not necessarily the number of matching items)".
How to get 10 first items which satisfy my condition?
var AWS = require('aws-sdk');
var db = new AWS.DynamoDB();
exports.handler = function(event, context) {
var params = {
TableName: "Events", //"StreamsLambdaTable",
ProjectionExpression: "ID, description, endDate, imagePath, locationLat, locationLon, #nm, startDate, #tp, userLimit", //specifies the attributes you want in the scan result.
FilterExpression: "locationLon between :lower_lon and :higher_lon and locationLat between :lower_lat and :higher_lat",
ExpressionAttributeNames: {
"#nm": "name",
"#tp": "type",
},
ExpressionAttributeValues: {
":lower_lon": {"N": event.low_lon},
":higher_lon": {"N": event.high_lon}, //event.high_lon}
":lower_lat": {"N": event.low_lat},
":higher_lat": {"N": event.high_lat}
}
};
db.scan(params, function(err, data) {
if (err) {
console.log(err); // an error occurred
}
else {
data.Items.forEach(function(record) {
console.log(
record.name.S + "");
});
context.succeed(data.Items);
}
});
};
I think you already know the reason behind this: the distinction that DynamoDB makes between ScannedCount and Count. As per this,
ScannedCount — the number of items that were queried or scanned,
before any filter expression was applied to the results.
Count — the
number of items that were returned in the response.
The fix for that is documented right above this:
For either a Query or Scan operation, DynamoDB might return a LastEvaluatedKey value if the operation did not return all matching items in the table. To get the full count of items that match, take the LastEvaluatedKey value from the previous request and use it as the ExclusiveStartKey value in the next request. Repeat this until DynamoDB no longer returns a LastEvaluatedKey value.
So, the answer to your question is: use the LastEvaluatedKey from DynamoDB response and Scan again.
I am processing a stream of text data where I don't know ahead of time what the distribution of its values are, but I know each one looks like this:
{
"datetime": "1986-11-03T08:30:00-07:00",
"word": "wordA",
"value": "someValue"
}
I'm trying to bucket it into RethinkDB objects based on it's value, where the objects look like the following:
{
"bucketId": "1",
"bucketValues": {
"wordA": [
{"datetime": "1986-11-03T08:30:00-07:00"},
{"datetime": "1986-11-03T08:30:00-07:00"}
],
"wordB": [
{"datetime": "1986-11-03T08:30:00-07:00"},
{"datetime": "1986-11-03T08:30:00-07:00"}
]
}
}
The purpose is to eventually count the number of occurrences for each word in each bucket.
Since I'm dealing with about a million buckets, and have no knowledge of the words ahead of time, the plan is to create this objects on the fly. I am new to RethinkDB, however, and I have tried my best to do this in such a way that I don't attempt to add a word key to a bucket that doesn't exist yet, but I am not entirely sure if I'm following best-practice here chaining the commands as follows (note that I am running this on a Node.js server using :
var bucketId = "someId";
var word = "someWordValue"
r.do(r.table("buckets").get(bucketId), function(result) {
return r.branch(
// If the bucket doesn't exist
result.eq(null),
// Create it
r.table("buckets").insert({
"id": bucketId,
"bucketValues" : {}
}),
// Else do nothing
"Bucket already exists"
);
})
.run()
.then(function(result) {
console.log(result);
r.table("buckets").get(bucketId)
.do(function(bucket) {
return r.branch(
// if the word already exists
bucket("bucketValues").keys().contains(word),
// Just append to it (code not implemented yet)
"Word already exists",
// Else create the word and append it
r.table("buckets").get(bucketId).update(
{"bucketValues": r.object(word, [/*Put the timestamp here*/])}
)
);
})
.run()
.then(function(result) {
console.log(result);
});
});
Do I need to execute run here twice, or am I way off base on how you're supposed to properly chain things together with RethinkDB? I just want to make sure I'm not doing this the wrong/hard way before I get much deeper into this.
You don't have to execute run multiple times, depend on what you want. Basically, a run() end the chain and send query to server. So we do all the thing to build the query, and end it with run() to execute it. If you use run() two times, that means it is send to server 2 times.
So if we can do all processing using only RethinkDB function, we need to call run only one time. However, if we want to some kind of post-processing data, using client side, then we have no choice. Usually I tried to do all processing using RethinkDB: with control structure, looping, and anonymous function we can go pretty far without letting client do some logic.
In your case, the query can be rewritten with NodeJS, using official driver:
var r = require('rethinkdb')
var bucketId = "someId2";
var word = "someWordValue2";
r.connect()
.then((conn) => {
r.table("buckets").insert({
"id": bucketId,
"bucketValues" : {}
})
.do((result) => {
// We don't care about result at all
// We just want to ensure it's there
return r.table('buckets').get(bucketId)
.update(function(bucket) {
return {
'bucketValues': r.object(
word,
bucket('bucketValues')(word).default([])
.append(r.now()))
}
})
})
.run(conn)
.then((result) => { conn.close() })
})