Converting JSON Data from S3 upload, and using Lambda function to push to DynamoDB - node.js

I've been working on an assignment recently and I feel like I'm very close to solving the problem I'm having, but I just can't seem to find anything that would help online.
As the title states, I've got some JSON data being uploaded from a webpage into an S3 bucket. When a new S3 item is created, I want to take that data and store it in a DynamoDB table.
I'm using a Lambda function and testing with some data I've already stored in my S3 bucket. I've got the data in its key-value pairs in my console.logs but I just can't work out why it isn't actually storing the data.
On the left I have the data broken down into its key-value pair, i.e. "artist": "Elvis Presley", using JSON.parse(JSON.stringify(data)).
What I'm wondering, is how to push this data into the table.
var params = {
Item: JSON.parse(JSON.stringify(data)),
ReturnConsumedCapacity: "TOTAL",
TableName: "s3-to-dynamo-s00187306"
};
dynamo.putItem(params, dynamoResultCallback);
The above code is what I've been trying to use but it's giving me a timeout error. If I bump up the allowed time then I receive a different error relating to a missing partition key in the item, even though my partition key matches with one of the key values in every item.
Really stumped here, any advice is appreciated, thanks in advance.
[edit]
So I used what someone suggested below, the dynamo-db converter, and have some logs which provide some insight into what's going on.
I've now got the data in the correct format for dynamo-db, and each item is parsed correctly as far as I can tell.
As for what dynamo represents, I'm not 100% so I'm going to add a screenshot of its declaration at the top of my code. I think it's the doc client?
[edit 2]
So my "_class" values are all the exact same, might try changing the partition key to title instead? (nevermind this didn't work)

JSON.stringify(data) return a json format that not match with Dynamodb format, Dynamodb are waiting a format like this:
Item: {
'CUSTOMER_ID' : {N: '001'},
'CUSTOMER_NAME' : {S: 'Richard Roe'}
}
As you see, the syntax is not the same, I think you need to use another library, maybe dynamo-converters, or look at NodeJs Aws SDK maybe there is a method that can do this.

Related

ADF: can't build simple Json file transformation (one field flattening)

I need a help in transforming simple json file inside Azure Data Flow. I need to flatten just one field date_sk in example here:
{
"date_sk": {"string":"2021-09-03"}
"is_influencer": 0,
"is_premium": -1,
"doc_id": "234"
}
Desired transformation:
"date_sk": {"string":"2021-09-03"}
to become
"dateToGroupBy" : "2021-09-03"
I create source stream, note the strange projection Azure picks, there is no "string" field anymore, but this is how automatic Azure transformation works for some reason:
Data preview of the same source stream node:
And here's how it suggest me to transform it in a separate "Derived Column" modifier. I played with the right part, but this is the only format (date_sk.{}) that does not display any error I was able to pick:
But then output dateToGroupBy field happens to be empty:
Any ideas on what could got wrong and how can I build the expected transformation? Thank you
Alright, it happened to be a Microsoft bug in ADF.
ADF stumbles upon "string" field name as JSON field, can't handle it, though schema and data validation passes through Ok, and showing no errors.
When I replace date_sk": {"string":"2021-09-03"} by date_sk": {"s1":"2021-09-03"} or anything other than string everything starts working just fine
and dateToGroupBy is filled with date values taken from date_sk.s1
When I return string back, it shows NULL in output values.
It supposed to either show error on verification stage or handle this field naming properly.

Why can i not get records has an array field contains primary key value in dynamoosejs?

I'm new to the dynamodb. I'm encountering an irritating problem
I have a record stored in the dynamodb like this:
{
bmgIds: ["d5a03ea2-e06e-5d01-84b7-94530b1059f7"],
id: "d5a03ea2-e06e-5d01-84b7-94530b1059f7",
.....
}
as you see, bmgIds array contains id value, when i used scan operation to get records has bmgIds match my condition but cannot get that record with my code
Model.scan('bmgIds').contains("d5a03ea2-e06e-5d01-84b7-94530b1059f7").exec()
I noticed that I can get records has bmgIds field not contains id value with code above
please help to explain where I'm wrong!
sorry I'm bad in English, I wish to receive help from you, thanks
Due to AWS scan response limit

BOTO3 only returning only returning partial data from DynamoDB Table

I have a relatively simple DynamoDB table that has a date string as the partition key, a name as a sort key, and a text representation of a JSON document as an attribute named Document.
I'm querying it with this code:
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('SolarData')
response = table.get_item(
Key={'StartDate': '2019-07-08', 'InverterName': 'Inverter1'},
)
print(response)
The problem that I'm having is that the response only has about the first 10K bytes of the Document which is about 50K bytes long.
I know that the entire document data is stored as I can see it in the AWS Console and also if I do a get-item from the CLI with this command:
aws dynamodb get-item --table-name SolarData --key "{""StartDate"":{""S"":""2019-07-10""},""InverterName"":{""S"":""Inverter1""}}"
I get the entire document written to the output. So I think the problem is either in my code, like I'm not handling the response correctly, or there is a bug in BOTO3.
I searched through the issues list on the BOTO3 GitHub project but didn't find anything that seemed relevant to this.
If there is data in your response, then result should be a data structure with an "Item" key. Test for that key.
Since you are able to print some of the result, and after consulting the docs, I would try some introspection at a REPL if you can.
item = response['Item']
dir(item)
len(item)
print(json.dumps(item, indent=4))
I have found these examinations help me get my head around what was returned from an unfamiliar API.

DynamoDB begins with not returning expected results

I'm using NodeJS and DynamoDB. I'm never used DynamoDB before, and primary a C# developer (where this would simply just be a .Where(x => x...) call, not sure why Amazon made it any more complicated then that). I'm trying to simply just query the table based on if an id starts with certain characters. For example, we have the year as the first 2 characters of the Id field. So something like this: 180192, so the year is 2018. The 20 part is irrelevant, just wanted to give a human readable example. So the Id starts with either 18 or 17 and I simply want to query the db for all rows that Id starts with 18 (for example, could be 17 or whatever). I did look at the documentation and I'm not sure I fully understand it, here's what I have so far that is just returning all results and not the expected results.
let params = {
TableName: db.table,
ProjectionExpression: "id,CompetitorName,code",
KeyConditionExpression: "begins_with(id, :year)",
ExpressionAttributeValues: {
':year': '18'
}
return db.docClient.scan(params).promise();
So as you can see, I'm thinking that this would be a begins_with call, where I look for 18 against the Id. But again, this is returning all results (as if I didn't have KeyConditionExpression at all).
Would love to know where I'm wrong here. Thanks!
UPDATE
So I guess begin_with won't work since it only works on strings and my id is not a string. As per commenters suggestion, I can use BETWEEN, which even that is not working either. I either get back all the results or Query key condition not supported error (if I use .scan, I get back all results, if I use .query I get the error)
Here is the code I'm trying.
let params = {
TableName: db.table,
ProjectionExpression: "id,CompetitorName,code",
KeyConditionExpression: "id BETWEEN :start and :end",
ExpressionAttributeValues: {
':start': 18000,
':end': 189999
}
};
return db.docClient.query(params).promise();
It seems as if there's no actual solution for what I was originally trying to do unfortunately. Which is a huge downfall of DynamoDB. There really needs to be some way to do 'where' using the values of columns, like you can in virtually any other language. However, I have to admit, part of the problem was the way that id was structured. You shouldn't have to rely on the id to get info out of it. Anyways, I did find another column DateofFirstCapture which using with contains (all the dates are not the same format, it's a mess) and using a year 2018 or 2017 seems to be working.
if you want to fetch data by id, add it as the partition key. If you want to get data by part of the string, you can use "begins with" on sort key.
begins_with (a, substr)— true if the value of attribute a begins with a particular substring.
source: https://docs.amazonaws.cn/en_us/amazondynamodb/latest/developerguide/Query.html
begins_with and between can only be used on sort keys.
For query you must always supply partition key.
So if you change your design to have unique partition key (or unique combo of partition/sort keys) and strings like 180192 as sort key you will be able to query begins_with(sortkey, ...).

Reading Cassandra Map in Node.js

I have table created using map in cassandra, Now i am trying to read the table from node.js and it returns object for the map, can i get the item count in a map and loop through it to get the items in the map?
table script
create table workingteam (teamid bigint primary key, status map)
You did not post a lot of details. First you will need to study the object Cassandra sends you. Good way to start would be to convert it to the JSON format and dump to the output through log.
console.log("Cassandra sent: %j", object);
I'm guessing in this object you will find attributes like connection parameters, host, client etc, but also something iterative that will contain keys and values.

Resources