How to scale S3 to thousands of requests per second? - node.js

AWS S3 documentation states
(https://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html):
Amazon S3 automatically scales to high request rates. For example, your application can achieve at least 3,500 PUT/POST/DELETE and 5,500 GET requests per second per prefix in a bucket.
To test this I have the following NodeJS code (using aws-sdk) which asynchronously initiates 1000 uploads of zero bytes (hence, simply adding empty entries to the bucket). There is a timer to measure the throughput:
var t0 = new Date().getTime()
for (var i = 0; i < 1000; i++) {
var s3 = new AWS.S3()
var id = uuid()
console.log('Uploading ' + id)
s3.upload({
Bucket: bucket,
Body : '',
Key : "test/" + id
},
function (err, data) {
if (data) console.log('Uploaded ' + id + ' ' + (new Date().getTime() - t0))
else console.log('Error')
})
}
It takes approximately 25 seconds to complete all upload requests. This is obviously nowhere near the purported 3500 requests per second, rather it is approximately 40 requests per second.
I have approximately 1MB network upload speed and network stats show that for most of the time the bandwidth is only about 25% saturated. Equally, CPU utilisation is also low.
So the question is:
How can I scale S3 upload throughput to achieve something near the 3500 requests per second that can apparently be achieved?
EDIT:
I modified the code like this:
var t0 = new Date().getTime()
for (var i = 0; i < 1000; i++) {
var s3 = new AWS.S3()
var id = String.fromCharCode('a'.charCodeAt(0) + (i % 26)) + uuid()
console.log('Uploading ' + id)
s3.upload({
Bucket: bucket,
Body: '',
Key: id
},
function (err, data) {
if (data) console.log('Uploaded ' + id + ' ' + (new Date().getTime() - t0))
else console.log('Error')
})
}
This uses 26 different prefixes, which the AWS documentation claims should scale the throughput by a factor of 26.
"It is simple to increase your read or write performance exponentially. For example, if you create 10 prefixes in an Amazon S3 bucket to parallelize reads, you could scale your read performance to 55,000 read requests per second."
However, no difference in the throughput is apparent. There is some kind of difference in the behaviour such that the requests appear to complete in a more parallel, rather than sequential fashion - but the completion time is just about the same.
Finally, I tried running the application in x4 separate bash threads (4 threads, 4 cores, 4x1000 requests). Despite the added parallelism from using multiple cores the total execution time is about 80 seconds and therefore did not scale.
for i in {0..3}; do node index.js & done
I wonder if S3 rate-limits individual clients/IPs (although this does not appear to be documented)?

I have a few things to mention before I give a straight answer to your question.
First, I did an experiment at one point, and I achieved 200000 PUT/DELETE requests in about 25 minutes, which is a little over 130 requests per second. The objects I was uploading were about 10 kB each. (I also had ~125000 GET requests in the same time span, so I’m sure that if I had only been doing PUTs, I could have achieved even higher PUT throughput.) I achieved this on a m4.4xlarge instance, which has 16 vCPUs and 64GB of RAM, that was running in the same AWS region as the S3 bucket.
To get more throughput, use more powerful hardware and minimize the number of network hops and potential bottlenecks between you and S3.
S3 is a distributed system. (Their documentation says the data is replicated to multiple AZs.) It is designed to serve requests from many clients simultaneously (which is why it’s great for hosting static web assets).
Realistically, if you want to test the limits of S3, you need to go distributed too by spinning up a fleet of EC2 instances or running your tests as a Lambda Function.
Edit: S3 does not make a guarantee for the latency to serve your requests. One reason for this might be because each request could have a different payload size. (A GET request for a 10 B object will be much faster than a 10 MB object.)
You keep mentioning the time to serve a request, but that doesn’t necessarily correlate to the number of requests per second. S3 can handle thousands of requests per second, but no single consumer laptop or commodity server that I know of can issue thousands of separate network requests per second.
Furthermore, the total execution time is not necessarily indicative of performance because when you are sending stuff over a network, there is always the risk of network delays and packet loss. You could have one unlucky request that has a slower path through the network or that request might just experience more packet loss than the others.
You need to carefully define what you want to find out and then carefully determine how to test it correctly.

Another thing you should look at is the HTTPS agent used.
It used to be the case (and probably still is) that the AWS SDK uses the global agent. If you're using an agent that will reuse connections, it's probably HTTP/1.1 and probably has pipelining disabled for compatibility reasons.
Take a look with a packet sniffer like Wireshark to check whether or not multiple connections outward are being made. If only one connection is being made, you can specify the agent in httpOptions.

Related

Saving many files to AWS S3 bucket in parallel is much slower than chunking and saving

I have a lambda function that reads a JSON file from S3, which contains around 500 objects.
As part of this lambda job, it takes those 500 objects and saves them separately. Each object is tiny, just around 500 bytes.
My code is currently handling it by saving all files in parallel and letting S3 handle the load. S3 docs suggest that it supports at least 3,500 PUT requests for a second.
// saveFile calls S3 under the hood (S3.upload)
await Promise.all(
transactions.map((transaction) => saveFile(transaction))
);
I enabled logging for the AWS object (aws-sdk) and wanted to see what the timing looks like. I was surprised to see that the time for each upload is increasing, not in correlation to the file size (which is small) but in correlation to the number of objects I'm trying to upload. It got to as high as 8 seconds for an upload operation to complete.
I tried doing the same thing but now I was handling the chunking:
for (const chunk of chunk(transactions, 50)) {
await Promise.all(
chunk.map((transaction) => saveFile(transaction))
);
}
Running the new code results in a significantly faster execution time. The upload time was around 400ms this time.
This weird result got me thinking that maybe something is throttled/queued on my code side. Then, I suddenly remembered that AWS limits the sockets for the underlying https agent to 50 by default. So now following their docs, I tried changing it to a high number, thinking that what's happening is that since I'm issuing 500 requests in parallel, they are all just stuck in the queue and affect each other, somehow.
const aws = require('aws-sdk');
const https = require('https');
const agent = new https.Agent({ maxSockets: 100000, keepAlive: true });
aws.config.update({
httpOptions: { agent },
});
// ...
await Promise.all(
transactions.map((transaction) => saveFile(transaction))
);
Unfortunately, it did not work. The high requests time is still there. I've tried many values for the maxSockets attribute, even trying plugging an http client instead of an https client, but of course, I got an error that I try to access an https resource, and it failed.
Any idea what could be the issue and how can I solve it?

will I hit maximum writes per second per database if I make a document using Promise.all like this?

I am now developing an app. and I want to send a message to all my users inbox. the code is like this in my cloud functions.
const query = db.collection(`users`)
.where("lastActivity","<=",now)
.where("lastActivity",">=",last30Days)
const usersQuerySnapshot = await query.get()
const promises = []
usersQuerySnapshot.docs.forEach( userSnapshot => {
const user = userSnapshot.data()
const userID = user.userID
// set promise to create data in user inbox
const p1 = db.doc(`users/${userID}/inbox/${notificationID}`).set(notificationData)
promises.push(p1)
})
return await Promise.all(promises)
there is a limit in Firebase:
Maximum writes per second per database 10,000 (up to 10 MiB per
second)
say if I send a message to 25k users (create a document to 25K users),
how long the operations of that await Promise.all(promises) will take place ? I am worried that operation will take below 1 second, I don't know if it will hit that limit or not using this code. I am not sure about the operation rate of this
if I hit that limit, how to spread it out over time ? could you please give a clue ? sorry I am a newbie.
If you want to throttle the rate at which document writes happen, you should probably not blindly kick off very large batches of writes in a loop. While there is no guarantee how fast they will occur, it's possible that you could exceed the 10K/second/database limit (depending on how good the client's network connection is, and how fast Firestore responds in general). Over a mobile or web client, I doubt that you'll exceed the limit, but on a backend that's in the same region as your Firestore database, who knows - you would have to benchmark it.
Your client code could simply throttle itself with some simple logic that measures its progress.
If you have a lot of documents to write as fast as possible, and you don't want to throttle your client code, consider throttling them as individual items of work using a Cloud Tasks queue. The queue can be configured to manage the rate at which the queue of tasks will be executed. This will drastically increase the amount of work you have to do to implement all these writes, but it should always stay in a safe range.
You could use e.g. p-limit to reduce promise concurrency in the general case, or preferably use batched writes.

How to know how many requests to make without knowing amount of data on server

I have a NodeJS application where I need to fetch data from another server (3rd-party, I have no control over it). The server requires you to specify a max number of entries to return, along with an offset. So for example if there are 100 entries on the server, I could request a pageSize of 100 and offset of 0, or pageSize of 10, and do 10 requests with offset 1,2,3, etc. and do a Promise.all (doing multiple concurrent smaller requests is faster when timing it).
var pageSize = 100;
var offsets = [...Array(totalItems / pageSize).keys()];
await Promise.all(offsets.map(async i => //make request with pageSize and offset));
The only problem is that the number of entries changes, and there is no property returned by the server indicating the total number of items. I could do something like this and while loop until the server comes back empty:
var offset = 0;
var pageSize = 100;
var data = [];
var response = await //make request with pageSize and offset
while (response is not empty){
data.push(response);
offset++;
//send another request
But that isn't as efficient/quick as sending multiple concurrent requests like above.
Is there any good way around this that can deal with the dynamic length of the data on the server?
Without the server giving you some hints about how many items there are, there's not a lot you can do to parallelize multiple requests as you don't really want to send more requests than are needed and you don't want to artificially make your requests for smallish number of items just so you can run more requests in parallel.
You could run some tests and find some practical limits. What are the maximum number of items that the server and your client seem to be OK with you requesting (100? 1000? 10,000? 100,000?) and just request that many to start with. If it indicates there are more after that, then send another request of a similar size.
The main idea with this is to minimize the number of separate requests and maximize the data you can get in a single call. That should be more efficient than more parallel requests, each requesting fewer items, because its ultimately the same server on the other end and same data store that has to provide all the data so the fewest roundtrips in the fewest separate requests is probably the best.
But, some of this is dependent upon the scale and architecture of the target host so experiments will be required to see what practically works best.

How to perform massive data uploads to firebase firestore

I have about ~300mb of data (~180k json objects) that gets updated once every 2-3 days.
This data is divided into three "collections", that I must keep up to date.
I decided to take the Node.JS way, but any solution in a language i know ( Java, Python) will be welcomed.
Whenever I perform a batch set using the node.JS firebase-admin client, not only it consumes an aberrant amount of ram ( about 4-6GB!), but it also tends to crash with errors that don't have a clear ( up to page 4 of google search without a meaningful answer ) reason.
My code is frankly simple, this is it:
var collection = db.collection("items");
var batch = db.batch();
array.forEach(item => {
var ref = collection.doc(item.id);
batch.set(ref, item);
});
batch.commit().then((res) => {
console.log("YAY",res);
});
I haven't found anywhere if there is a limit on the number of writes in a limited span of time (I understand doing 50-60k writes should be easy peasy with a backend the size of firebase), and also found that this can go up the ram train and have like 4-6GB of ram allocated.
I can confirm that when the errors are thrown, or the ram usage clogs my laptop, whatever happens first, I am still at less than 1-4% my daily usage quotas, so that is not the issue.

Performance bottlenecks when using async-await with Azure Storage API

I'm hitting a performance bottleneck, on insertion requests using the Azure Table Storage API. I'm trying to reach of a speed of at least 1 insert per 30ms into a table (unique partition keys).
What is the recommended way to achieve this request rate and how can I fix my program to overcome my bottleneck?
I have a test programs that inserts into the azure table at roughly 1 / 30ms. With this test program, the latency continuously increases and requests begin to take even more than 15 seconds per insert.
Below is the code for my test program. It creates async tasks that log the time it takes to await on the CloudTable ExecuteAsync method. Unfortunately, the insertion latency just grows as the program runs.
List<Task> tasks = new List<Task>();
while (true)
{
Thread.Sleep(30);
tasks = tasks.Where(t => t.IsCompleted == false).ToList(); // Remove completed tasks
DynamicTableEntity dte = new DynamicTableEntity() { PartitionKey = Guid.NewGuid().ToString(), RowKey = "abcd" };
tasks.Add(AddEntityToTableAsync(dte));
}
...
public static async Task<int> AddEntityToTableAsync<T>(T entity) where T : class, ITableEntity
{
Stopwatch timer = Stopwatch.StartNew();
var tableResult = await this.cloudTable.ExecuteAsync(TableOperation.InsertOrReplace(entity));
timer.Stop();
Console.WriteLine($"Table Insert Time: {timer.ElapsedMilliseconds}, Inserted {entity.PartitionKey}");
return tableResult.HttpStatusCode;
}
I thought that it might be my test program running out of threads for the outgoing Network IO, so I tried monitoring the available thread counts during the program's execution:
ThreadPool.GetAvailableThreads(out workerThreads, out completionIoPortThreads);
It showed that nearly all the IO threads were available during execution (Just in case, I even tried increasing the available threads but that had no affect on the issue).
As I understand it, for async tasks, the completion port threads don't get "reserved" until there's data on them to process, so I started thinking that there might be an issue with my connection to Azure Table Storage.
However, I confirmed that was not the case by lowering the request rate (1 insert / 100ms) and launching 30 instances of my test program on the same machine. With 30 instances, I was able to maintain a stable ~90ms / insert without any increase in latency.
What can I do to enable a single test program to achieve a simillar performance that I was getting when running 30 programs on the same machine?
The test program was hitting the System.Net.ServicePointManager.DefaultConnectionLimit. The default value is 2
Increasing the number to 100 fixes the problem. And allows the single program to achieve the same speed as the 30 programs scenario

Resources