Number of requests that can be simultaneously sent to TrueVault from an IP? - truevault

I'm writing a program to cleanup certain data from my Vaults that are irrelevant. I estimate this program to generate many thousand requests in order to completely handle all the data.
What kind of limits (either requests per second / no. of simultaneously executing requests) do I need to bake into my code?

Related

air traffic controller for threads when calling a REST API

DISCLAIMER: If this post is off-topic to this site, please recommend a site where this post would be appropriate.
On Ubuntu 18.04, in bash, I am writing a network-based, threaded application that requires multiple servers. It receives files through the network and processes them, ultimately making an API call that finishes the processing and logs the results to a database for later retrieval and reporting.
So far I have written the application using non-threaded programming models and concepts. That means the files are processed one at a time in real-time. This works great if there is no sudden burst of files and/or a backlog of files to process. The main bottle neck has been the way I sequentially send files to the API one after another, waiting until the entire operation has taken place for one file and the API returns the results. The API has a rate limit of 8 calls per second. But since each call takes from .75 to 1 second, my program waits until the operation is done and only processes about 1 file per second through the API. In short, I did not have to worry about scheduling API calls because I could barely do one call per second.
Since the capacity is there to process 8 files per second, and I need more speed, I have been converting my single-threaded, sequential application into a parallel, scalable, multi-threaded application. This new version can spawn enough threads to send 8 files per second to the REST API and much more. So now I have the opposite problem. I am sending too many requests per second to the REST API and am in danger of triggering penalties, etc. Ultimately, when my traffic is higher, I will upgrade my subscription to the API and get more calls per second, but this current dilemma has got me thinking about how to schedule the API calls with different threads.
The purpose of this post is to discuss an idea about how to schedule these REST API calls across various threads. Specifically, I want to discuss how to coordinate timing and usage of the API while maintaining efficiency and yet not overloading the API. In short, I want to coordinate a group of threads so that the API is properly used. Not too fast and not too slow.
Independent of my application, this idea could be useful in a number of generically similar scenarios.
My idea is to create an "air traffic controller" ("ATC") so that the threads of the application have a centralized timing authority to check when they are ready to submit files to the REST API. The ATC would know how many time slots/calls per time period (in this case, calls per second) the API can schedule. The ATC would be listening for the threads to request a time slot ("launch code") which would give them a time slot in the future to perform their API call. The ATC would decide based on the schedule of other launch codes that it has already handed out.
In my case, from the start of the upload of the file to the API, it could take 0.75 to 1 second to complete the processing and receive a response from the API. This does not affect the count of new API calls that can be performed. It is just a consideration of how long the threads will be waiting once they call the API. It may not be relevant to this overall discussion.
Each thread would obviously have to do some error handling. If the API timed out or threw an error, then the thread would have to handle it and get back in line with the ATC -if appropriate- and ask for a new launch code. Maybe it should report the error to the ATC for centralized logging?
In situations where the file processing needs burst above 8 files per second, there would be a scheduling backlog where the threads should wait their turn as assigned by the ATC.
Here are some other considerations:
Function
The ATC would be a lightweight daemon that does the following:
- listens on some TCP port
- receives a request
security token (?), thread id, priority
- authenticates the request (?)
- examines schedule
- reserves the next available time slot
- returns the launch code
security token (?), current time, launch timing offset to current time, URL and auth token for the API
- expunged expired launch codes
The ATC would need the following:
- to know what port it is supposed to run on
- to know how many slots per time period it was set to schedule
(e.g. 8 per second)
- to have a super fast read/write access to the schedule (associative array?)
- to know the URL and corresponding auth token for the thread to use
- maybe to know multiple URLs and auth tokens for load balancing
Here are more things to consider:
Security
How could we keep the ATC secure while ensuring high performance?
Network-level security (e.g. firewalls allowing only the IP addresses of the file-processing servers?)
Auth tokens or logins and passwords?
Performance
What would the requirements be for this ATC server? Would this be taxing to a CPU and memory?
Timing
How often would an NTP call be needed? By the ATC server? By the servers which call the API?
Scalability
Being able to provide different URLs and auth tokens would allow the ATC to load balance with different API providers.
Threading of the ATC itself
Would the ATC need to spawn threads to be able to handle each new request?
How does a web server handle requests?
How would the various threads share a common schedule?
In a non-threaded environment, the ATC would possibly keep an associative array in memory to keep performance as high as possible. How would the various threads of the ATC have access to the same schedule?
So here is my question. Does this exist? If not, what are some best practices in trying to build the above?
It seems like a beanstalkd kind of network service except it only provides permission/scheduling and is extremely dependant on timing.

duplicate traffic on server

I have real live data of http requests made to my server. I want to duplicate those requests with their real timestamps.
i'm reading/parsing my requests and their timestamps from a csv file. I have about a million requests and I want to test how many requests my server can handle before it crashes.
I've been doing some reading and it seems best thing is to use is either Beanshell or Groovy.
My problem is i'm not really sure on how to use them with a custom timer for the requests I send. I want to read timestamps from the csv, calculate the delay between each request and send the requests based on that delay.
any thoughts? .. or if someone has a better way to do this whole thing, it would also be of help.
It seems as a huge overhead where you just need to find the peak of transactions or concurrent users,
Or/and find transaction per second rate and then execute the "duplication" according these parameters.

What's the relationship between QPS/TPS, response time and number of concurrent users

Some Concepts:
TPS means Transactions per second
Response time is the total amount of time it takes to respond to a request for service
Is this formula true?
TPS = number of concurrent users / response time
It is true if transactions happen sequentially and in only one thread (on one TCP connection) per user. In reality, however, when talking about web browsers, they will use multiple concurrent connections when talking to a host. 6 concurrent connections is quite common, so the host will then get TPS = 6 x concurrent users / response time.
Also, the browser will sometimes be blocked and not fetch things. Sometimes because it is executing code, sometimes because it cannot perform some operations simultaneously with other operations. See http://www.browserscope.org for more info.
Also, of course, clients (whether they are humans using a browser or e.g. a mobile phone app talking to its backend via a REST API) don't usually make requests back to back, continuously, at the highest possible rate. That is probably not a very realistic test case. Usually, clients will make a bunch of requests and then fall silent for a while, until the user does something new in the application that requires more data from the backend.

handling millions of requests/second: how does load balancer(main server thread) works

What will happen:
If I write a server application backed with a thread pool of millions of threads and it gets millions of requests per second
I have worked on developing web services. The web service was deployed on 1000's of computers with a front end load balancer. The load balancer's job was to distribute the traffic amongst the servers that actually process the web requests.
So my question is that since the process running inside load balancer itself HAS to be single threaded to listen to web requests on a port, how does it handle accepting millions of requests per second. the load balancer might be busy delegating a task, then what happens to the incoming request at that instance of time?
In my opinion, all clients will not be handled since there will only be single request handler thread to pass on the incoming request to the thread pool
This way no multi threaded server should ever work.
I wonder how does facebook/amazon handles millions of requests per second.
You are right, it won't work. There is a limit to how much a single computer can process, which is nothing to do with how many threads it is running.
The way Amazon and Facebook etc handle it is to have hundreds or thousands of servers spread throughout the world and then they pass the requests out to those various servers. This is a massive subject though so if you want to know more I suggest you read up on distributed computing and come back if you have specific questions.
With the edit, the question makes much more sense. It is not hard to distribute millions of requests per second. A distribution operation should take somewhat in the viscinity of tens of nanoseconds and would merely consist of pushing the received socket into the queue. No biggie.
As soon as it's done, balancer is ready to accept the next request.

Translating requests per second into concurrent users

We're building a web service that needs to handle about 200 requests per second. But most popular load testing tools talk about running a load test with a certain number of "concurrent users".
Could anyone tell me how do I translate my requirement of "200 requests per second" into "number of concurrent users"? I'm new to the field of performance testing and from all that I've read so far, this aspect of it doesn't get addressed.
Thanks
Vimal
This translation is not possible in the general case. The problem is that a user can make multiple requests. If each user could make exactly one request (e.g. your service is completely stateless), and each request would take exactly a second, your number of concurrent users may coincide with the number of requests per seconds.
Otherwise (and those are big assumptions to make), you either track users while logging and get the respective numbers from the log or you add your assumptions into the requirements for the load test.

Resources