I have an implementation where I have one wallet (wallet per client/company) and a number of users belonging to one wallet.
I have hundreds of API calls per second for Authorization from users. When I get authorization API calls from users and following activities are done,
Lock the wallet belonging to the current user.
Make sure validation rules are passed.
Wallet has sufficient balance for the transaction to be successful.
Update wallet balance.
Unlock the wallet belonging to the current user.
There are other calculations, and updates on different tables as well and the total time taken could be anywhere between 500 to 1000 milliseconds.
This works fine when we have fewer users per wallet and fewer simultaneous calls per second from users, belonging to one wallet.
How can I scale this architecture or changes to this implementation so that I can support hundreds of requests per second belonging to the same wallet?
The issue is when I have lots of requests per second because of the lock on the wallet requests will be in the queue and they will timeout.
I was reading about event-based systems but here every auth call is sync API call, so not sure how to adapt to event-based architecture.
I can think of a couple of options that can be tried in this situation. To improve the concurrency under heavy load, we can test the below things.
Use optimistic locking instead of pessimistic locking. Then either retry the failed transactions or ask the end user to retry.
Reference: https://vladmihalcea.com/optimistic-locking-version-property-jpa-hibernate/
Use Akka. Reference: https://akka.io/
Use JCTools: MPSC. Reference: https://github.com/JCTools/JCTools
Related
I just read this article from Node.js: Don't Block the Event Loop
The Ask
I'm hoping that someone can read over the use case I describe below and tell me whether or not I'm understanding how the event loop is blocked, and whether or not I'm doing it. Also, any tips on how I can find this information out for myself would be useful.
My use case
I think I have a use case in my application that could potentially cause problems. I have a functionality which enables a group to add members to their roster. Each member that doesn't represent an existing system user (the common case) gets an account created, including a dummy password.
The password is hashed with argon2 (using the default hash type), which means that even before I get to the need to wait on a DB promise to resolve (with a Prisma transaction) that I have to wait for each member's password to be generated.
I'm using Prisma for the ORM and Sendgrid for the email service and no other external packages.
A take-away that I get from the article is that this is blocking the event loop. Since there could potentially be hundreds of records generated (such as importing contacts from a CSV or cloud contact service), this seems significant.
To sum up what the route in question does, including some details omitted before:
Remove duplicates (requires one DB request & then some synchronous checking)
Check remaining for existing user
For non-existing users:
Synchronously create many records & push each to a separate array. One of these records requires async password generation for each non-existing user
Once the arrays are populated, send a DB transaction with all records
Once the transaction is cleared, create invitation records for each member
Once the invitation records are created, send emails in a MailData[] through SendGrid.
Clearly, there are quite a few tasks that must be done sequentially. If it matters, the asynchronous functions are also nested: createUsers calls createInvites calls sendEmails. In fact, from the controller, there is: updateRoster calls createUsers calls createInvites calls sendEmails.
There are architectural patterns that are aimed at avoiding issues brought by potentially long-running operations. Note here that while your example is specific, any long running process would possibly be harmful here.
The first obvious pattern is the cluster. If your app is handled by multiple concurrent independent event-loops of a cluster, blocking one, ten or even thousand of loops could be insignificant if your app is scaled to handle this.
Imagine an example scenario where you have 10 concurrent loops, one is blocked for a longer time but 9 remaining are still serving short requests. Chances are, users would not even notice the temporary bottleneck caused by the one long running request.
Another more general pattern is a separated long-running process service or the Command-Query Responsibility Segregation (I'm bringing the CQRS into attention here as the pattern description could introduce more interesting ideas you could be not familiar with).
In this approach, some long-running operations are not handled directly by backend servers. Instead, backend servers use a Message Queue to send requests to yet another service layer of your app, the layer that is solely dedicated to running specific long-running requests. The Message Queue is configured so that it has specific throughput so that if there are multiple long-running requests in short time, they are queued, so that possibly some of them are delayed but your resources are always under control. The backend that sends requests to the Message Queue doesn't wait synchronously, instead you need another form of return communication.
This auxiliary process service can be maintained and scaled independently. The important part here is that the service is never accessed directly from the frontend, it's always behind a message queue with controlled throughput.
Note that while the second approach is often implemented in real-life systems and it solves most issues, it can still be incapable of handling some edge cases, e.g. when long-running requests come faster than they are handled and the queue grows infintely.
Such cases require careful maintenance and you either scale your app to handle the traffic or you introduce other rules that prevent users from running long processes too often.
DISCLAIMER: If this post is off-topic to this site, please recommend a site where this post would be appropriate.
On Ubuntu 18.04, in bash, I am writing a network-based, threaded application that requires multiple servers. It receives files through the network and processes them, ultimately making an API call that finishes the processing and logs the results to a database for later retrieval and reporting.
So far I have written the application using non-threaded programming models and concepts. That means the files are processed one at a time in real-time. This works great if there is no sudden burst of files and/or a backlog of files to process. The main bottle neck has been the way I sequentially send files to the API one after another, waiting until the entire operation has taken place for one file and the API returns the results. The API has a rate limit of 8 calls per second. But since each call takes from .75 to 1 second, my program waits until the operation is done and only processes about 1 file per second through the API. In short, I did not have to worry about scheduling API calls because I could barely do one call per second.
Since the capacity is there to process 8 files per second, and I need more speed, I have been converting my single-threaded, sequential application into a parallel, scalable, multi-threaded application. This new version can spawn enough threads to send 8 files per second to the REST API and much more. So now I have the opposite problem. I am sending too many requests per second to the REST API and am in danger of triggering penalties, etc. Ultimately, when my traffic is higher, I will upgrade my subscription to the API and get more calls per second, but this current dilemma has got me thinking about how to schedule the API calls with different threads.
The purpose of this post is to discuss an idea about how to schedule these REST API calls across various threads. Specifically, I want to discuss how to coordinate timing and usage of the API while maintaining efficiency and yet not overloading the API. In short, I want to coordinate a group of threads so that the API is properly used. Not too fast and not too slow.
Independent of my application, this idea could be useful in a number of generically similar scenarios.
My idea is to create an "air traffic controller" ("ATC") so that the threads of the application have a centralized timing authority to check when they are ready to submit files to the REST API. The ATC would know how many time slots/calls per time period (in this case, calls per second) the API can schedule. The ATC would be listening for the threads to request a time slot ("launch code") which would give them a time slot in the future to perform their API call. The ATC would decide based on the schedule of other launch codes that it has already handed out.
In my case, from the start of the upload of the file to the API, it could take 0.75 to 1 second to complete the processing and receive a response from the API. This does not affect the count of new API calls that can be performed. It is just a consideration of how long the threads will be waiting once they call the API. It may not be relevant to this overall discussion.
Each thread would obviously have to do some error handling. If the API timed out or threw an error, then the thread would have to handle it and get back in line with the ATC -if appropriate- and ask for a new launch code. Maybe it should report the error to the ATC for centralized logging?
In situations where the file processing needs burst above 8 files per second, there would be a scheduling backlog where the threads should wait their turn as assigned by the ATC.
Here are some other considerations:
Function
The ATC would be a lightweight daemon that does the following:
- listens on some TCP port
- receives a request
security token (?), thread id, priority
- authenticates the request (?)
- examines schedule
- reserves the next available time slot
- returns the launch code
security token (?), current time, launch timing offset to current time, URL and auth token for the API
- expunged expired launch codes
The ATC would need the following:
- to know what port it is supposed to run on
- to know how many slots per time period it was set to schedule
(e.g. 8 per second)
- to have a super fast read/write access to the schedule (associative array?)
- to know the URL and corresponding auth token for the thread to use
- maybe to know multiple URLs and auth tokens for load balancing
Here are more things to consider:
Security
How could we keep the ATC secure while ensuring high performance?
Network-level security (e.g. firewalls allowing only the IP addresses of the file-processing servers?)
Auth tokens or logins and passwords?
Performance
What would the requirements be for this ATC server? Would this be taxing to a CPU and memory?
Timing
How often would an NTP call be needed? By the ATC server? By the servers which call the API?
Scalability
Being able to provide different URLs and auth tokens would allow the ATC to load balance with different API providers.
Threading of the ATC itself
Would the ATC need to spawn threads to be able to handle each new request?
How does a web server handle requests?
How would the various threads share a common schedule?
In a non-threaded environment, the ATC would possibly keep an associative array in memory to keep performance as high as possible. How would the various threads of the ATC have access to the same schedule?
So here is my question. Does this exist? If not, what are some best practices in trying to build the above?
It seems like a beanstalkd kind of network service except it only provides permission/scheduling and is extremely dependant on timing.
Im new to Jmaeter and an currently trying to get the best use out of it to create an API performance test plan.
Lets take the following scenario.
We have an APi which returns data such as part availability and order details for a range or parts.
I want to analyse the response times of the api under different load patterns.
Lets say we have 5 users.
-Each user sends a series of repeated Requests to the API.
-The request made by each user is unique only to that user.
i.e
User 1 requests parts a,b,c.
User 2 requests parts d,e,f... and so on
-All users are sanding their requests at the same time.
The way I have approached this is to create 5 separate thread groups for each user.
Within each thread group is the specific http request that gets sent by each user.
Each http request is governed by its own loop controller where i have set the number of times for each request to be sent
Since I want all users to be sending their requests at once I have unchecked
“run thread groups consecutively” in the main test plan. at a glance the test plan looks something like this:
test plan view
Since im new to using Jmeter and performance testing i have a few questions regarding my approach:
Is the way I have structured the test plan suitable and maintainable in terms of increasing the number of users that I may wish to test with?
Or would it have been better to have a single thread group with 5 child loop controllers, each containing the user specific request body data?
With my current set up, each thread group uses the default ramp up time of 1 second. I figured this is okay since each thread group represents only one user. However i think this might cause a delay on the start up of each test run. Are there any other potentially better ways to handle this such as using the scheduler or incrementing the ramp up time for each thread group so that they don all start at exactly the same time?
Thanks in advance for any advice
Your approach is correct.
If you want the requests to be in parallel they will have to be in separate Thread Groups. Each Thread Group should model a use-case. In your case, the use-case is a particular mix of requests.
By running the test for sufficiently long time you will not feel the effects of ramp-up time.
First of all your test needs to be realistic, it should represent real users (or user groups) as close as possible. If test does it - it is a good test and vice versa. Something like:
If User1 and User2 represent 2 different group of users (like User1 is authenticated and User2 is not authenticated or User1 is admin and User2 is guest) they should go into different Thread Groups.
It is better to use Thread Group iterations instead of Loop Controllers as some test elements like HTTP Cookie Manager have settings like Clear Cookies each Iteration which don't respect iterations produced by Loop or While Controller, they consider only Thread Group-driven iterations
The only way to guarantee sending requests at the same time is putting them under one Thread Group and using Synchronizing Timer
When it comes to real load test, you should be always gradually adding the load so you could correlate various metrics like response time, throughput, error rate with increased number of virtual users. Same approach should be applied for "ramping-down", you should not be turning off the load at once in order to be able to see how does your application recover after the load. You might want to use some custom Thread Groups available via JMeter Plugins project like:
Stepping Thread Group
Ultimate Thread Group
They provide flexible and convenient way to set the desired load pattern.
I'm building an application using tag subscriptions in the real-time API and have a question related to capacity planning. We may have a large number of users posting to a subscribed hashtag at once, so the question is how often will the API actually POST to our subscription processing endpoint? E.g., if 100 users post to #testhashtag within a second or two, will I receive 100 POSTs or does the API batch those together as one update? A related question: is there a maximum rate at which POSTs can be sent (e.g., one per second or one per ten seconds, etc.)?
The Instagram API seems to lack detailed information about both how many updates are sent and what are the rate limits. From the [API docs][1]:
Limits
Be nice. If you're sending too many requests too quickly, we'll send back a 503 error code (server unavailable).
You are limited to 5000 requests per hour per access_token or client_id overall. Practically, this means you should (when possible) authenticate users so that limits are well outside the reach of a given user.
In other words, you'll need to check for a 503 and throttle your application accordingly. No information I've seen for how long they might block you, but it's best to avoid that completely. I would advise you manage this by placing a rate limiting mechanism on your own code, such as pushing your API requests through a queue with rate control. That will also give you the benefit of a retry of you're throttled so you won't lose any of the updates.
Moreover, a mechanism such as a queue in the case of real-time updates is further relevant because of the following from the API docs:
You should build your system to accept multiple update objects per payload - though often there will be only one included. Also, you should acknowledge the POST within a 2 second timeout--if you need to do more processing of the received information, you can do so in an asynchronous task.
Regarding the number of updates, the API can send you 1 update or many. The problem with this is you can absolutely murder your API calls because I don't think you can batch calls to specific media items, at least not using the official python or ruby clients or API console as far as I have seen.
This means that if you receive 500 updates either as 1 request to your server or split into many, it won't matter because either way, you need to go and fetch these items. From what I observed in a real application, these seemed to count against our quota, however the quota itself seems to consume resources erratically. That is, sometimes we saw no calls at all consumed, other times the available calls dropped by far more than we actually made. My advice is to be conservative and take the 5000 as a best guess rather than an absolute. You can check the remaining calls by parsing one of the headers they send back.
Use common sense, don't be stupid, and using a rate limiting mechanism should keep you safe and have the benefit of dealing with failures either due to outages (this happens more than you may think), network hicups, and accidental rate limiting. You could try to be tricky and use different API keys in a pooling mechanism, but this is likely a violation of the TOS and if they are doing anything via IP, you'd have to split this up to different machines with different IPs.
My final advice would be to restructure your application to not completely rely on the subscription mechanism. It's less than reliable and very expensive API wise. It's only truly useful if you just need to do something in your app that doesn't require calling back to Instgram, your number of items is small, or you can filter out the majority of items to avoid calling back to Instagram accept when a specific business rule is matched.
Instead, you can do things like query the tag or the user (ex: recent media) and scale it out that way. Normally this allows you to grab 100 items with 1 request rather than 100 items with 100 requests. If you really want to be cute, you could at least merge the subscription notifications asynchronously and combine the similar ones into a single batched request when you combine the duplicate characteristics such as tag into a single bucket. Sort of like a map/reduce but on a small data set. You could of course do an actual map/reduce from time-to-time on your own data as another way of keeping things in async. Again, be careful not to thrash instagram, but rather just use map/reduce to batch out your calls in a way that's useful to your app.
Hope that helps.
I wanted to tailor the application I am making which communicates with the quickbooks server and adds things like customers and check expenses and I wanted my application to be as efficient as possible regarding performance. For example, my intention was to have all customer additions (batch process) on one thread and all check expenses or bills (batch process) on another thread which is logically possible as the two procedures don't interfere and are not related to one another.
My question is would such a design approach be permissible by Intuit? I guess my concern is regarding any limitations on communication with their servers.
In the docs site, the following throttling policy is mentioned.
What are the throttling limits based on QB accounts, OAuth client, and RealmId at any given time?
EDIT Following line is not valid anymore. FAQ page is updated.
Apart from an upper limit set that ensures no more than 10 requests in progress at any given time;
EDIT
we have a throttling policy across all IDS apis to permit 500 requests/minute per AuthId and per RealmId. The policy permits 200 requests/minute per AuthId for reports endpoints.
Ref - https://developer.intuit.com/docs/0025_quickbooksapi/0058_faq
So, if you follow the above throttling limit then parallel processing using multiple threads is not an issue.
PN - You can't create multiple name entities ( ex - Vendor, Employee and Customer) using parallel threads. Service puts a lock across these 3 entities to ensure an unique name is getting used while creating a new entity.
Thanks