DynamoDB queries take 2000ms for no obvious reason - node.js

I am using DynamoDB to query some data. At a bottom you can see number of miliseconds for certain percentage of requests.
Most of the time DynamoDB works fine with about 100ms response time. (there are only queries on Primary Key or Indexes). About 0.4% requests take more than 800ms (which is the limit the service need to provide response) even at "calm" times. Its not perfect, but good enough.
However certain load (which is not even big) triggers DynamoDB behaviour that causes about 5% of requests to have 2000ms response times. There are several queries in one request, but if one of these queries take more than 800ms, whole request is terminated by caller.
We are using Node.js and this is how we use the AWS library:
This is how we initiate the library
const dynamodb = new AWS.DynamoDB.DocumentClient({
region: config.server.region,
});
/// ...
dynamodb.query(params).promise()
Example of query
{
"TableName":"prod-UserDeviceTable",
"KeyConditionExpression":"#userId = :userId",
"ExpressionAttributeNames":{
"#userId":"userId"
},
"ExpressionAttributeValues":{
":userId":"e3hs8etjse3t13se8h7eh4"
},
"ConsistentRead":false
}
Few more notes:
CPU of service is currently autoscaled to be below 10%, higher load or cpu usage does not make the service work worse, we even tried to reach 50% with same response times
There is no "grow" of response time. Its usually less than 100ms and then out of nowhere its 2000 for 1-5%. As you can see in the graph
The queries do not use ConsistentRead
We are using Global Tables in 4 regions. This happens in all regions (the service is deployed independently in all 4 regions as well)
DynamoDB metrics for tables used do not show any spikes in response times
Service is deployed in ECS Fargate, but it was deployed before in ECS EC2 with same results
All tables in all regions are on-demand
The 2000ms delay happen anywhere in service during request lifetime when querying dynamodb (=happens for different queries, even for different tables)
When I repeat the same request that took more than 2000ms again, it takes only 100ms or less

Related

OpenSearch (ElasticSearch) latency issue under multi-threading using RestHighLevelClient

We use RestHighLevelClient to query AWS OpenSearch in our service. Recently we have seen some latency issues related to OpenSearch calls so I'm doing stress test to troubleshoot but observed some unexpected behaviors.
In our service when a request is received, we start 5 threads and make one OpenSearch call within each thread in parallel in order to achieve the latency performance similar to one call. During load tests even when I send traffic with 1TPS, for the same request I'm seeing very different latency numbers for different threads, specifically there's usually one or two threads seeing huge latency compared to others, which seems like that thread is being blocked by something, for example 390 ms, 300ms, 1.1 sec, 520ms, 30ms for each thread while in the mean time I don't see any search latency spike reported on OpenSearch service, with the max SearchLatency being under 350ms all the time.
I read that the low level rest client used in the RestHighLevelClient is managing a conn pool with very small default maxConn values so I've override both the DEFAULT_MAX_CONN_PER_ROUTE to be 100 and DEFAULT_MAX_CONN_TOTAL to be 200 when creating the client but it doesn't seem working based on the test results I saw before and after updating these two values.
I'm wondering if anyone has seen similar issues or has any ideas on what could be the reason for this behavior. Thanks!

How to avoid database from being hit hard when API is getting bursted?

I have an API which allows other microservices to call on to check whether a particular product exists in the inventory. The API takes in only one parameter which is the ID of the product.
The API is served through API Gateway in Lambda and it simply queries against a Postgres RDS to check for the product ID. If it finds the product, it returns the information about the product in the response. If it doesn't, it just returns an empty response. The SQL is basically this:
SELECT * FROM inventory where expired = false and product_id = request.productId;
However, the problem is that many services are calling this particular API very heavily to check the existence of products. Not only that, the calls often come in bursts. I assume those services loop through a list of product IDs and check for their existence individually, hence the burst.
The number of concurrent calls on the API has resulted in it making many queries to the database. The rate can burst beyond 30 queries per sec and there can be a few hundred thousands of requests to fulfil. The queries are mostly the same, except for the product ID in the where clause. The column has been indexed and it takes an average of only 5-8ms to complete. Still, the connection to the database occasionally time out when the rate gets too high.
I'm using Sequelize as my ORM and the error I get when it time out is SequelizeConnectionAcquireTimeoutError. There is a good chance that the burst rate was too high and it max'ed out the pool too.
Some options I have considered:
Using a cache layer. But I have noticed that, most
of the time, 90% of the product IDs in the requests are not repeated.
This would mean that 90% of the time, it would be a cache miss and it
will still query against the database.
Auto scale up the database. But because the calls are bursty and I don't
know when they may come, the autoscaling won't complete in time to
avoid the time out. Moreover, the query is a very simple select statement and the CPU of the RDS instance hardly crosses 80% during the bursts. So I doubt scaling it would do much too.
What other techniques can I do to avoid the database from being hit hard when the API is getting burst calls which are mostly unique and difficult to cache?
Use cache in the boot time
You can load all necessary columns into an in-memory data storage (redis). Every update in database (cron job) will affect cached data.
Problems: memory overhead of updating cache
Limit db calls
Create a buffer for ids. Store n ids and then make one query for all of them. Or empty the buffer every m seconds!
Problems: client response time extra process for query result
Change your database
Use NoSql database for these data. According to this article and this one, I think choosing NoSql database is a better idea.
Problems: multiple data stores
Start with a covering index to handle your query. You might create an index like this for your table:
CREATE INDEX inv_lkup ON inventory (product_id, expired) INCLUDE (col, col, col);
Mention all the columns in your SELECT in the index, either in the main list of indexed columns or in the INCLUDE clause. Then the DBMS can satisfy your query completely from the index. It's faster.
You could start using AWS lambda throttling to handle this problem. But, for that to work the consumers of your API will need to retry when they get 429 responses. That might be super-inconvenient.
Sorry to say, you may need to stop using lambda. Ordinary web servers have good stuff in them to manage burst workload.
They have an incoming connection (TCP/IP listen) queue. Each new request coming in lands in that queue, where it waits until the server software accept the connection. When the server is busy requests wait in that queue. When there's a high load the requests wait for a bit longer in that queue. In nodejs's case, if you use clustering there's just one of these incoming connection queues, and all the processes in the cluster use it.
The server software you run (to handle your API) has a pool of connections to your DBMS. That pool has a maximum number of connections it it. As your server software handles each request, it awaits a connection from the pool. If no connection is immediately available the request-handling pauses until one is available, then handles it. This too smooths out the requests to the DBMS. (Be aware that each process in a nodejs cluster has its own pool.)
Paradoxically, a smaller DBMS connection pool can improve overall performance, by avoiding too many concurrent SELECTs (or other queries) on the DBMS.
This kind of server configuration can be scaled out: a load balancer will do. So will a server with more cores and more nodejs cluster processes. An elastic load balancer can also add new server VMs when necessary.

What should be done when the provisioned throughput is exceeded?

I'm using AWS SDK for Javascript (Node.js) to read data from a DynamoDB table. The auto scaling feature does a great job during most of the time and the consumed Read Capacity Units (RCU) are really low most part of the day. However, there's a programmed job that is executed around midnight which consumes about 10x the provisioned RCU and since the auto scaling takes some time to adjust the capacity, there are a lot of throttled read requests. Furthermore, I suspect my requests are not being completed (though I can't find any exceptions in my error log).
In order to handle this situation, I've considered increasing the provisioned RCU using the AWS API (updateTable) but calculating the number of RCU my application needs may not be straightforward.
So my second guess was to retry failed requests and simply wait for auto scale increase the provisioned RCU. As pointed out by AWS docs and some Stack Overflow answers (particularlly about ProvisionedThroughputExceededException):
The AWS SDKs for Amazon DynamoDB automatically retry requests that receive this exception. So, your request is eventually successful, unless the request is too large or your retry queue is too large to finish.
I've read similar questions (this one, this one and this one) but I'm still confused: is this exception raised if the request is too large or the retry queue is too large to finish (therefore after the automatic retries) or actually before the retries?
Most important: is that the exception I should be expecting in my context? (so I can catch it and retry until auto scale increases the RCU?)
Yes.
Every time your application sends a request that exceeds your capacity you get ProvisionedThroughputExceededException message from Dynamo. However your SDK handles this for you and retries. The default Dynamo retry time starts at 50ms, the default number of retries is 10, and backoff is exponential by default.
This means you get retries at:
50ms
100ms
200ms
400ms
800ms
1.6s
3.2s
6.4s
12.8s
25.6s
If after the 10th retry your request has still not succeeded, the SDK passes the ProvisionedThroughputExceededException back to your application and you can handle it how you like.
You could handle it by increasing throughput provision but another option would be to change the default retry times when you create the Dynamo connection. For example
new AWS.DynamoDB({maxRetries: 13, retryDelayOptions: {base: 200}});
This would mean you retry 13 times, with an initial delay of 200ms. This would give your request a total of 819.2s to complete rather than 25.6s.

How to do capacity control in this case?

my app reads data from DynamoDB, which has a pre-configured read capacity, which limits the read throughput. I'd like to control my query to not reach the limit, here is how I'm doing this now:
const READ_CAPACITY = 80
async function query(params) {
const consumed = await getConsumedReadCapacity()
if (consumed > READ_CAPACITY) {
await sleep((consumed-READ_CAPACITY)*1000/READ_CAPACITY)
}
const result = await dynamoDB.query(params).promise()
await addConsumedReadCapacity(result.foo.bar.CapacityUnits)
return result.Items
}
async function getConsumedReadCapacity() {
return redis.get(`read-capacity:${Math.floor(Date.now() / 1000)}`)
}
async function addConsumedReadCapacity(n) {
return redis.incrby(`read-capacity:${Math.floor(Date.now() / 1000)}`, n)
}
as you can see, a query will first check current consumed read capacity, if it does'nt exceed READ_CAPACITY, do the query, and add up the consumed read capacity.
the problem is the code is running on several servers, so there are race conditions, where the consumed > READ_CAPACITY check passed, and before it executes dynamoDB.query, dynamodb readed capacity limit by queries from other processes on other servers. How can I improve this?
Some things to try instead of avoiding hitting capacity limits...
Try, then back-off
From DyanmoDB error handling:
ProvisionedThroughputExceededException: The AWS SDKs for DynamoDB automatically retry requests that receive this exception. Your request is eventually successful, unless your retry queue is too large to finish. Reduce the frequency of requests, using Error Retries and Exponential Backoff.
Burst
From Best Practices for Tables:
DynamoDB provides some flexibility in the per-partition throughput provisioning. When you are not fully utilizing a partition's throughput, DynamoDB retains a portion of your unused capacity for later bursts of throughput usage. DynamoDB currently retains up to five minutes (300 seconds) of unused read and write capacity. During an occasional burst of read or write activity, these extra capacity units can be consumed very quickly—even faster than the per-second provisioned throughput capacity that you've defined for your table.
DynamoDB Auto Scaling
From Managing Throughput Capacity Automatically with DynamoDB Auto Scaling:
DynamoDB auto scaling uses the AWS Application Auto Scaling service to dynamically adjust provisioned throughput capacity on your behalf, in response to actual traffic patterns. This enables a table or a global secondary index to increase its provisioned read and write capacity to handle sudden increases in traffic, without throttling. When the workload decreases, Application Auto Scaling decreases the throughput so that you don't pay for unused provisioned capacity.
Cache in SQS
Some AWS customers have implemented a system where, if throughput is exceeded, they store the data in an Amazon SQS queue. They then have a process that retrieves the data from the queue and inserts into the table later, when there is less demand on throughput. This allows the DynamoDB table to be provisioned based on average throughput rather than peak throughput.

How to optimize Postgresql max_connections and node-postgres connection pool?

In brief, I am having trouble supporting more than 5000 read requests per minute from a data API leveraging Postgresql, Node.js, and node-postgres. The bottleneck appears to be in between the API and the DB. Here are the implmentation details.
I'm using an AWS Postgresql RDS database instance (m4.4xlarge - 64 GB mem, 16 vCPUs, 350 GB SSD, no provisioned IOPS) for a Node.js powered data API. By default the RDS's max_connections=5000. The node API is load-balanced across two clusters with 4 processes each (2 Ec2s with 4 vCPUs running the API with PM2 in cluster-mode). I use node-postgres to bind the API to the Postgresql RDS, and am attempting to use it's connection pooling feature. Below is a sample of my connection pool code:
var pool = new Pool({
user: settings.database.username,
password: settings.database.password,
host: settings.database.readServer,
database: settings.database.database,
max: 25,
idleTimeoutMillis: 1000
});
/* Example of pool usage */
pool.query('SELECT my_column FROM my_table', function(err, result){
/* Callback code here */
});
Using this implementation and testing with a load tester, I can support about 5000 requests over the course of one minute, with an average response time of about 190ms (which is what I expect). As soon as I fire off more than 5000 requests per minute, my response time increases to over 1200ms in the best of cases and in the worst of cases the API begins to frequently timeout. Monitoring indicates that for the EC2s running the Node.js API, CPU utilization remains below 10%. Thus my focus is on the DB and the API's binding to the DB.
I have attempted to increase (and decrease for that matter) the node-postgres "max" connections setting, but there was no change in the API response/timeout behavior. I've also tried provisioned IOPS on the RDS, but no improvement. Also, interestingly, I scaled the RDS up to m4.10xlarge (160 GB mem, 40 vCPUs), and while the RDS CPU utilization dropped greatly, the overall performance of the API worsed considerably (couldn't even support the 5000 requests per minute that I was able to with the smaller RDS).
I'm in unfamilar territory in many respects and am unsure of how to best determine which of these moving parts is bottlenecking API performance when over 5000 requests per minute. As noted I have attempted a variety of adjustments based on the review of Postgresql configuration documentation and node-postgres documentation, but to no avail.
If anyone has advice on how to diagnose or optimize I would greatly appreciate it.
UPDATE
After scaling up to m4.10xlarge, i performed a series of load-tests, varying the number of request/min and the max number of connections in each pool. Here are some screen captures of monitoring metrics:
In order to support more then 5k requests, while maintaining the same response rate, you'll need better hardware...
The simple math states that:
5000 requests*190ms avg = 950k ms divided into 16 cores ~ 60k ms per core
which basically means your system was highly loaded.
(I'm guessing you had some spare CPU as some time was lost on networking)
Now, the really interesting part in your question comes from the scale up attempt: m4.10xlarge (160 GB mem, 40 vCPUs).
The drop in CPU utilization indicates that the scale up freed DB time resources - So you need to push more requests!
2 suggestions:
Try increasing the connection pool to max: 70 and look at the network traffic (depending on the amount of data you might be hogging the network)
also, are your requests to the DB a-sync from the application side? make sure your app can actually push more requests.
The best way is to make use of a separate Pool for each API call, based on the call's priority:
const highPriority = new Pool({max: 20}); // for high-priority API calls
const lowPriority = new Pool({max: 5}); // for low-priority API calls
Then you just use the right pool for each of the API calls, for optimum service/connection availability.
Since you are interested in read performance can set up replication between two (or more) PostgreSQL instances, and then use pgpool II to load balance between the instances.
Scaling horizontally means you won't start hitting the max instance sizes at AWS if you decide next week you need to go to 10,000 concurrent reads.
You also start to get some HA in your architecture.
--
Many times people will use pgbouncer as a connection pooler even if they have one built into their application code already. pgbouncer works really well and is typically easier to configure and manage that pgpool, but it doesn't do load balancing. I'm not sure if it would help you very much in this scenario though.

Resources