I'm writing a script that saves a CSV into a dynamoDB table. I'm using Node.js and the aws-sdk module. Everything seems to be correct, but I'm sending over 50k records to Dynamo, while only 1181 are saved and shown on the web console.
I've tried with different amount of records and this is the biggest count I get ,no matter if I try saving 100k, 10k or 50k.
According to AWS's documentation, there shouldn't be any limit to the amount of records, any idea as to what other factors could influence this hard limit?
BTW, my code is catching errors from the insert actions, and I'm not picking up any when inserting past the 1181 mark, so the module is not really helping.
Any extra idea would be appreciated.
If your using the DynamoDb batchWriteitem or another batch insert you need to check the "UnprocessedItems" Element in the response. Sometimes batch writes exceed the provisioned write capacity of your table and it will not process all of your inserts, sounds like what is happening here.
You should check the response of your insert and if there are unprocessed items setup a retry and exponential backoff timing strategy in your code. This will allow for the additional items to be inserted until all of your CSV is processed.
Here is the refrerence link for Dynamo BatchWriteItem if you want to take a closer look at the response elements. Good luck!
Related
I tried to create more than 100 documents in a batch and received a 400 (Bad Request) result from the server with the error Batch request has more operations than what is supported.
Creating 100 documents works fine. Clearly, there is a limit of 100 operations per batch.There's no documentation I could find anywhere the solution.
I can not store them in different batches because even if one doc fail to store I want others also to rollback. Any somebody please guide me how to achieve this using cosmos db?
Transactional Batch has two upper-limits, size-wise (aside from the restriction that a batch must be within the same partition of the same collection):
100 items
2MB payload
Going beyond 100 items (or 2MB) will require you to iterate through multiple batches, checkpointing with each successfully-written batch. How you accomplish this is really up to you, as there is no mechanism built-in.
The limitations on batch item count and size are documented here.
I would like to check how many entries are in a DynamoDB table that matches a query without retrieving the actual entries, using boto3.
I want to run a machine learning job on data from DynamoDB table. The data I'm training on is a data that answers a query, not the entire table. I want to run the job only if I have enough data to train on.
Therefore, I want to check if I want to check that I have enough entries that match the query.
It is worth mentioning that the DynamoDB table I'm querying is really big, therefore actual retrieving is no option unless I actually want to run the job.
I know that I can use boto3.dynamodb.describe_table() to get how many entries there are in the entire table, but as I mentioned earlier, I want to know only how many entries match a query.
Any ideas?
This was asked and answered in the past, see How to get item count from DynamoDB?
Basically, you need to use the "Select" parameter to tell DynamoDB to only count the query's results, instead of retrieving them.
As usual in DynamoDB, this is truncated by paging: if the result set (not the count - the actual full results) is larger than 1 MB, then only the first 1 MB is retrieved, and the items in it counted, and you get back this partial count. If you're only interested in checking whether you have "enough" results - this may even be better for you - because you don't want to pay for reading a gigabyte of data just to check if the data is there. You can even ask for a smaller page, to read less - depending on what you consider enough data.
Just remember that you'll pay Amazon not by the amount of data returned (just one integer, the count) but by the amount of data read from disk. Using such counts excessively may lead to surprising large costs.
I'm a beginner and trying to work through this problem. I have a list of 3000 items in CSV that I need to process each day and update to DynamoDB table. I iterate through the list in NodeJS. In each loop, I fire a conditional update to DynamoDB to update each entry.
However, wouldn't this approach not work with DynamoDB as now I have a burst of 3000 write request in a second and have nothing in the rest of 24 hours.
Here are some of my thoughts:
Sequential write: Waiting the previous write to complete before continuing to the next one? I think this will overload my write capacity as well
Using some sort of Messaging queue service? This seems overkill as the program I'm writing is a simple service to parse a CSV file daily.
What kind of approach do I need to do to solve this problem?
This kind of scenario sounds like DynamoDB on-demand capacity mode would serve you well. That way you are paying for just the operations you do and storage. When you are not doing any operations, you just pay for storage.
By using Boto3's batch insert, maximum how many records we can insert into Dynamodb's table. Suppose i'm reading my input json from S3 bucket which is of 6gb in size.
And it cause any performance issues while inserting as a batch. Any sample is helpful. I just started looking into this, based on my findings i'll update here.
Thanks in advance.
You can use the Boto3 batch_writer() function to do this. The batch writer handles chunking up the items into batches, retrying, etc. You create the batch writer as a context manager, add all of your items within the context, and the batch writer sends your batch requests when it exits the context.
import boto3
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('table-name')
with table.batch_writer() as writer:
for item in table_data:
writer.put_item(Item=item)
There's a complete working code example on GitHub here: https://github.com/awsdocs/aws-doc-sdk-examples/blob/master/python/example_code/dynamodb/batching/dynamo_batching.py.
You can find information like this in the service documentation for BatchWriteItem:
A single call to BatchWriteItem can write up to 16 MB of data, which can comprise as many as 25 put or delete requests. Individual items to be written can be as large as 400 KB.
There are no performance issues, aside from consuming the write capacity units.
I have a scenario. In DB, I have a table with a huge amount of records (2 million) and I need to export them to xlsx or csv.
So the basic approach that I used, is running a query against DB and put the data into an appropriate file to download.
Problems:
There is a DB timeout that I have set to 150 sec which sometimes isn't
enough and I am not sure if expanding timeout would be a good idea!
There is also some certain timeout with express request, So it basically timed out my HTTP req and hits for second time (for unknown reason)
So as a solution, I am thinking of using stream DB connection and with that if in any way I can provide an output stream with the file, It should work.
So basically I need help with the 2nd part, In stream, I would receive records one by one and at the same time, I am thinking of allowing user download the file progressively. (this would avoid request timeout)
I don't think it's unique problem but didn't find any appropriate pieces to put together. Thanks in advance!
If you see it in your log, do you run the query more than once?
Does your UI timeout before the server even reach the res.end()?