In a heavily loaded system being pounded with a very high number of requests such that hazelcast client and server are running on different VM's.
1) Which approach is better to use in terms of response times,
fetching data from IMap using getAll
or
fetching data from IMap iterating over keys and using getAsync and then using future to store the retrieved data.
2) When does retrieval of data actually happen in case of getAsync? When future is invoked or when getAsync is called?
3) Which of the two should perform better when backoff is disabled?
As a general rule of thumb you need to minimize network trips in a distributed system. So getAll() is better which sends one operation per partition instead of get() which will send one operation over the network per each key.
2) It may or may not be executed before calling future.get(), but it will block and will get the result when you call it if the result is still not there.
Related
I just read this article from Node.js: Don't Block the Event Loop
The Ask
I'm hoping that someone can read over the use case I describe below and tell me whether or not I'm understanding how the event loop is blocked, and whether or not I'm doing it. Also, any tips on how I can find this information out for myself would be useful.
My use case
I think I have a use case in my application that could potentially cause problems. I have a functionality which enables a group to add members to their roster. Each member that doesn't represent an existing system user (the common case) gets an account created, including a dummy password.
The password is hashed with argon2 (using the default hash type), which means that even before I get to the need to wait on a DB promise to resolve (with a Prisma transaction) that I have to wait for each member's password to be generated.
I'm using Prisma for the ORM and Sendgrid for the email service and no other external packages.
A take-away that I get from the article is that this is blocking the event loop. Since there could potentially be hundreds of records generated (such as importing contacts from a CSV or cloud contact service), this seems significant.
To sum up what the route in question does, including some details omitted before:
Remove duplicates (requires one DB request & then some synchronous checking)
Check remaining for existing user
For non-existing users:
Synchronously create many records & push each to a separate array. One of these records requires async password generation for each non-existing user
Once the arrays are populated, send a DB transaction with all records
Once the transaction is cleared, create invitation records for each member
Once the invitation records are created, send emails in a MailData[] through SendGrid.
Clearly, there are quite a few tasks that must be done sequentially. If it matters, the asynchronous functions are also nested: createUsers calls createInvites calls sendEmails. In fact, from the controller, there is: updateRoster calls createUsers calls createInvites calls sendEmails.
There are architectural patterns that are aimed at avoiding issues brought by potentially long-running operations. Note here that while your example is specific, any long running process would possibly be harmful here.
The first obvious pattern is the cluster. If your app is handled by multiple concurrent independent event-loops of a cluster, blocking one, ten or even thousand of loops could be insignificant if your app is scaled to handle this.
Imagine an example scenario where you have 10 concurrent loops, one is blocked for a longer time but 9 remaining are still serving short requests. Chances are, users would not even notice the temporary bottleneck caused by the one long running request.
Another more general pattern is a separated long-running process service or the Command-Query Responsibility Segregation (I'm bringing the CQRS into attention here as the pattern description could introduce more interesting ideas you could be not familiar with).
In this approach, some long-running operations are not handled directly by backend servers. Instead, backend servers use a Message Queue to send requests to yet another service layer of your app, the layer that is solely dedicated to running specific long-running requests. The Message Queue is configured so that it has specific throughput so that if there are multiple long-running requests in short time, they are queued, so that possibly some of them are delayed but your resources are always under control. The backend that sends requests to the Message Queue doesn't wait synchronously, instead you need another form of return communication.
This auxiliary process service can be maintained and scaled independently. The important part here is that the service is never accessed directly from the frontend, it's always behind a message queue with controlled throughput.
Note that while the second approach is often implemented in real-life systems and it solves most issues, it can still be incapable of handling some edge cases, e.g. when long-running requests come faster than they are handled and the queue grows infintely.
Such cases require careful maintenance and you either scale your app to handle the traffic or you introduce other rules that prevent users from running long processes too often.
I have an API endpoint, which creates and sends a few transactions in strict sequence. Because I don't wait for results of these transactions, I specify a nonce number for each of them to execute them in the right order.
This endpoint is built using AWS Lambda function. So, if I have many concurrent requests, the lambda runs in concurrent mode. In this case, several concurrent instances can get the same nonce (I'm using eth.getTransactionCount method to get the latest transaction count) and send a few transactions with the same nonce. Therefore, I receive errors because instead of creating a new transaction, it tries to replace an existing one.
Basically, I need a way to check if a nonce is already taken right before the transaction sending or somehow reserve a nonce number (is it even possible?).
The web3 getTransactionCount() only returns the amount of already mined transactions, but there's currently no way to return the highest pending nonce (for an address) using web3.
So you'll need to store your pending nonces in a separate DB (e.g. Redis). Each Lambda run will need to access this DB to get the highest pending nonce, calculate one that it's going to be using (probably just +1), and store this number to the DB so that other instances can't use it anymore.
Mind that it's recommended to implement a lock (Redis, DynamoDB) to prevent multiple app instances from accessing the DB and claiming the same value at the same time.
Basically, I need a way to check if a nonce is already taken right before the transaction sending or somehow reserve a nonce number (is it even possible?).
You should not.
Instead, you should manage nonce in your internal database (SQL, etc.) which provides atomic counters and multiple readers and writers. You only rely to the network provided nonce if 1) your system has failed 2) you need to manually reset it.
Here is an example code for Web3.py and SQLAlchemy.
I have an API which allows other microservices to call on to check whether a particular product exists in the inventory. The API takes in only one parameter which is the ID of the product.
The API is served through API Gateway in Lambda and it simply queries against a Postgres RDS to check for the product ID. If it finds the product, it returns the information about the product in the response. If it doesn't, it just returns an empty response. The SQL is basically this:
SELECT * FROM inventory where expired = false and product_id = request.productId;
However, the problem is that many services are calling this particular API very heavily to check the existence of products. Not only that, the calls often come in bursts. I assume those services loop through a list of product IDs and check for their existence individually, hence the burst.
The number of concurrent calls on the API has resulted in it making many queries to the database. The rate can burst beyond 30 queries per sec and there can be a few hundred thousands of requests to fulfil. The queries are mostly the same, except for the product ID in the where clause. The column has been indexed and it takes an average of only 5-8ms to complete. Still, the connection to the database occasionally time out when the rate gets too high.
I'm using Sequelize as my ORM and the error I get when it time out is SequelizeConnectionAcquireTimeoutError. There is a good chance that the burst rate was too high and it max'ed out the pool too.
Some options I have considered:
Using a cache layer. But I have noticed that, most
of the time, 90% of the product IDs in the requests are not repeated.
This would mean that 90% of the time, it would be a cache miss and it
will still query against the database.
Auto scale up the database. But because the calls are bursty and I don't
know when they may come, the autoscaling won't complete in time to
avoid the time out. Moreover, the query is a very simple select statement and the CPU of the RDS instance hardly crosses 80% during the bursts. So I doubt scaling it would do much too.
What other techniques can I do to avoid the database from being hit hard when the API is getting burst calls which are mostly unique and difficult to cache?
Use cache in the boot time
You can load all necessary columns into an in-memory data storage (redis). Every update in database (cron job) will affect cached data.
Problems: memory overhead of updating cache
Limit db calls
Create a buffer for ids. Store n ids and then make one query for all of them. Or empty the buffer every m seconds!
Problems: client response time extra process for query result
Change your database
Use NoSql database for these data. According to this article and this one, I think choosing NoSql database is a better idea.
Problems: multiple data stores
Start with a covering index to handle your query. You might create an index like this for your table:
CREATE INDEX inv_lkup ON inventory (product_id, expired) INCLUDE (col, col, col);
Mention all the columns in your SELECT in the index, either in the main list of indexed columns or in the INCLUDE clause. Then the DBMS can satisfy your query completely from the index. It's faster.
You could start using AWS lambda throttling to handle this problem. But, for that to work the consumers of your API will need to retry when they get 429 responses. That might be super-inconvenient.
Sorry to say, you may need to stop using lambda. Ordinary web servers have good stuff in them to manage burst workload.
They have an incoming connection (TCP/IP listen) queue. Each new request coming in lands in that queue, where it waits until the server software accept the connection. When the server is busy requests wait in that queue. When there's a high load the requests wait for a bit longer in that queue. In nodejs's case, if you use clustering there's just one of these incoming connection queues, and all the processes in the cluster use it.
The server software you run (to handle your API) has a pool of connections to your DBMS. That pool has a maximum number of connections it it. As your server software handles each request, it awaits a connection from the pool. If no connection is immediately available the request-handling pauses until one is available, then handles it. This too smooths out the requests to the DBMS. (Be aware that each process in a nodejs cluster has its own pool.)
Paradoxically, a smaller DBMS connection pool can improve overall performance, by avoiding too many concurrent SELECTs (or other queries) on the DBMS.
This kind of server configuration can be scaled out: a load balancer will do. So will a server with more cores and more nodejs cluster processes. An elastic load balancer can also add new server VMs when necessary.
I went through this article and the following rose a question:
QUEUED INPUTS If you’re receiving a high amount of concurrent data,
your database can become a bottleneck. As depicted above, Node.js can
easily handle the concurrent connections themselves. But because
database access is a blocking operation (in this case), we run into
trouble.
Isn't Db access an asynchronous operation in Nodejs? E.g. I usually perform all possible data transformations using MongoDb aggregation to minimize impact on NodeJs. Or I get things wrong?
that is why callbacks came into picture. that is the actual use of callbacks sine we don't know how much time db will take to process the aggregation. Db access is asynchronous just because of callbacks .
I'm having trouble with poor performance on CouchDB's _changes feed when there are multiple observers.
I have CouchDB running inside a virtual machine on a laptop, and multiple iOS clients are consuming _changes?feed=continuous on one of the databases over the network, using CouchDB's HTTP API. As the number of clients increases, the speed at which the changes come through is slowed to a crawl.
N.B. I'm actually communicating with CouchDB via an Apache reverse proxy, which is compressing the responses.
And I'm also noticing that, while applying a filter to the feed, it will often go long periods without delivering any changes to the HTTP stream. Almost as if I'm waiting for it to check a batch of documents that don't meet my filter.
Is there anything settings I can enable or optimisations I can make that will help speed this all up?
The increase of latency with the number of consumers of filtered _changes feed is no surprise when you realize, that for each change couchdb has ask the query server to evaluate the filter() function. Apparently it doesn't cache the results so it has to perform this operation for each consumer.
Something you could try is dropping the filter parameter and using the include_docs=true instead. This way the feed producer wouldn't have to ask the view server to evaluate the changes. This should make it more responsive. Of course, this comes with the price of significantly increasing the amount of data transferred in the feed and you have to duplicate the filter() function logic on the client side. Its not ideal, but I think its worth a shot.