AWS Maximum BadRequestException retries reached for query Using Data API to Query RDS Serveless Aurora - python-3.x

I have created a Lambda function that uses awswrangler data api to read in data from an RDS Serverless Aurora PostgreSQL Database from a query. The query contains a conditional that is a list of IDs. If the query has less then 1K ids it works great, if over 1K I get this message:
Maximum BadRequestException retries reached for query
An example query is:
"""select * from serverlessDB where column_name in %s""" % ids_list
I adjusted the serverless RDS instance to force scaling as well as increased the concurrency on the lambda function. Is there a way fix this issue?

With PostgreSQL, 1000+ items in a WHERE IN clause should be just fine.
I believe you are running into a Data API limit.
There isn't a fixed upper limit on the number of parameter sets. However, the maximum size of the HTTP request submitted through the Data API is 4 MiB. If the request exceeds this limit, the Data API returns an error and doesn't process the request. This 4 MiB limit includes the size of the HTTP headers and the JSON notation in the request. Thus, the number of parameter sets that you can include depends on a combination of factors, such as the size of the SQL statement and the size of each parameter set.
The response size limit is 1 MiB. If the call returns more than 1 MiB of response data, the call is terminated.
The maximum number of requests per second is 1,000./
Either your request exceeds 4 MB or the response of your query exceeds 1 MB. I suggest you split your query into multiple smaller queries.
Reference: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/data-api.html

Related

Find a specific index read

I'm having an issue with a roxie query when being called via an API. I've tested the query in HThor and the web service on 8002 and it runs perfectly. When calling it from an API I'm getting memory pool exhausted. The only clue I have in the logs is
Pool memory exhausted: pool id 0 exhausted, requested 1 heap(4095/4294967295) global(4096/4096) WM(1..128) (in Index Read 40)"
How can I find out which index read this is referring to?
Thanks
David
The index read is indicated in the brackets - it is activity 40 in the graph.
The error message is indicating that all of the memory has been exhausted (4096 pages of 256K each). It is likely to be an index read that has a poor filter condition, returns a large number of rows and is feeding into an activity that needs all the input rows to process it (e.g. a sort).

Elasticsearch 429 Too Many Requests _bulk with synchronous requests

I am using AWS Elasticsearch service. On dev environment there is t3.small instance.
I have approx 15 000 records that I want to index as a bulk. What I do is splitting this amount on chunks of 250 items each (or lower than 10 MiB). And run _bulk request with refresh="wait_for" option, one by one, and waiting until request is finished before sending the next one.
At some point, approximately on 25 iteration, the request is immediately fails with message
429 Too Many Requests /_bulk
Just in case, if chunk size will be 500 this will fail on 25/2 request (around 12)
It doesn't tell anything more. Just only this, I cannot understand why this happens if there is no anything else that could send bulk requests in parallel with me. I checked that the data size is lesser than 10MB.
What I already have
I send each request consistently, awaiting the previous one
Bulk request size is lesser than 10MiB
Each bulk request contains no more than 250 records in it (+ 250 to indicate that this is indexing)
I am using refresh="wait_for"
And even have 2 seconds delay before sending a new request (which I strongly want to remove)
Adding new instances or increasing storage space doesn't help at all
What could be the reason of having that error? How can I be guaranteed that my request will not be failed if I send everything consistently? Is there any additional option I can pass to be sure that bulk request is completely finished?
A 429 error message as a write rejection indicates a bulk queue error. The es_rejected_execution_exception[bulk] indicates that your queue is full and that any new requests are rejected. When the number of requests to the Elasticsearch cluster exceeds the bulk queue size (threadpool.bulk.queue_size), this bulk queue error occurs. A bulk queue on each node can hold between 50 and 200 requests, depending on which Elasticsearch version you are using.
You can consult this link https://aws.amazon.com/premiumsupport/knowledge-center/resolve-429-error-es/ and check the write rejection best practices

Can retrieve only 20 documents from a folder

I have a Spring CM folder that has 1000s of small files in it. I'm doing retrieval this way:
GET /v201411/folders/{id}/documents
but when it executes, I get back only 20 files. The sum of all of their sizes is: 1.8 MB and the Content Length of the response -> content -> headers is only 3.8 MB.
I didn't find anything in their documentations that mentions the limit of retrieving documents via the api.
Is that really the limitation of spring CM?
From the documentation on API Collections:
Limit (integer): The maximum number of elements retrieved per request.
Default limit is 20. Maximum limit is 100
When there are more items in the collection than the specified limit,
the application can page through the collection, retrieving the
objects in chunks by specifying the limit and/or offset on the query
string when the collection is requested. The first, previous, next,
and last properties are added as a convenience by appending the
appropriate limit and offset to the URI and a GET request can be done
to this URIs specified by these properties to navigate the collection.
To minimize the number of sequential calls you need to make, you can adjust the limit property up to the max, 100.

Why does querying Azure table storage with ExecuteQuery return fewer results than ExecuteQuerySegmented?

I'm curious if Azure can timeout on ExecuteQuery or silently error if there is a memory limit that is causing ExecuteQuery to return fewer records than ExecuteQuerySegmented.
When I run ExecuteQuery, I get a total of 1,223,749 records.
When I run ExecuteQuerySegmented, I get a total of 1,482,504 records.
The two queries are:
(the ExecuteQuerySegmented is inside of a do/while that handles the token)
var queryResult = table.ExecuteQuerySegmented(new TableQuery<RecordType>().Where(TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, PartitionValue, token);
var query = new TableQuery<RecordType>().Where(TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, PartitionValue));
results.AddRange(table.ExecuteQuery(query));
A call to a Table service API can include a server timeout interval, specified in the timeout parameter of the request URI. If the server timeout interval elapses before the service has finished processing the request, the service returns an error.
The maximum timeout interval for Table service operations is 30 seconds. The Table service automatically reduces any timeouts larger than 30 seconds to the 30-second maximum.
The Table service enforces server timeouts as follows:
Query operations: During the timeout interval, a query may execute for up to a maximum of five seconds. If the query does not complete within the five-second interval, the response includes continuation tokens for retrieving remaining items on a subsequent request. See Query Timeout and Pagination for more information.
I'd recommend checking the following documentation.
In case if the query is taking more than 30secs, this may mean that the records shown by ExecuteQuery are caused by the timeout.

Count(*) on a SimpleDB table of millions of entries

How long should it take to get a response for the statement SELECT count(*) FROM db_name on a SimpleDB table of millions of entries? (currently my table >16M).
Shouldn't there some sort of "pagination" using the next_token parameter if the operation takes too long? (it's been hanging there for minutes now!)
There's something wrong. No count query will take more than 5 seconds, because after 5 seconds it cuts off and gives you a next token.
If the count request takes more than five seconds, Amazon SimpleDB returns the number of items that it could count and a next token to return additional results. The client is responsible for accumulating the partial counts.
http://docs.amazonwebservices.com/AmazonSimpleDB/latest/DeveloperGuide/CountingDataSelect.html
SimpleDB responses are typically under 200ms, not counting data transfer speed (from Amazon's server to yours, which is less than 50ms if you're on EC2).
However, the total size of a SimpleDB response cannot exceed 2,500 rows or 1MB, whichever is smaller.
See "Limit" here
http://docs.amazonwebservices.com/AmazonSimpleDB/latest/DeveloperGuide/index.html?UsingSelect.html

Resources