Assuming that there is only one admission controller Pod running, and the admission controller has a webhook that will be triggered by Pod deletion events.
Example Scenario
There are 2 Pods (Pod A and Pod B) within a namespace. 2 different users (Alice and Bob) perform Pod deletion at the exact same time, in which:
Alice deletes the Pod A
Bob deletes the Pod B
In this specific scenario, will the admission controller handle both the admission requests serially or in parallel? In other words, will the admission controller handle the admission request for Pod A before that of Pod B (or vice versa), or will it handle both the admission requests are the same time?
General Scenario
The admission requests are sent from the API Server to the admission controller. Generally speaking, will it be possible that multiple admission requests are sent to the admission controller at the exact same time?
And if so, will the admission controller handle them in parallel via some built-in parallelism mechanism, or will the admission controller queue them and process them serially?
Since in kube-api options we can see --max-mutating-requests-inflight and --max-requests-inflight flags, which are used to determine the server's total concurrency limit, I think that admission controller also support multi-threading. Because otherwise it will be a bottleneck for proceeding with API requests.
This is true and accurate advice, as long as we are using the base environment.
But on the other hand, we can customize our environment and specify how requests should be processed.
For that purpose can be used API Priority and Fairness (APF). APF classifies and isolates requests in a more fine-grained way in comparison with --max-mutating-requests-inflight and --max-requests-inflight.
Without APF enabled, overall concurrency in the API server is limited by the kube-apiserver flags --max-requests-inflight and --max-mutating-requests-inflight. With APF enabled, the concurrency limits defined by these flags are summed and then the sum is divided up among a configurable set of priority levels. Each incoming request is assigned to a single priority level, and each priority level will only dispatch as many concurrent requests as its configuration allows.
The default configuration, for example, includes separate priority levels for leader-election requests, requests from built-in controllers, and requests from Pods.
So, it's a quite wide question. It all depends on which admission controller we using: original or custom, what controls are used (--max-mutating-requests-inflight and --max-requests-inflight command-line flags), APF configuration.
Related
Using Ktor and Kotlin 1.5 to implement a REST service backed by Netty. A couple of things about this service:
"Work" takes non-trivial amount of time to complete.
A unique client endpoint sends multiple requests in parallel to this service.
There are only a handful of unique client endpoints.
The service is not scaling as expected. We ran a load test with parallel requests coming from a single client and we noticed that we only have two threads on the server actually processing the requests. It's not a resource starvation problem - there is plenty of network, memory, CPU, etc. and it doesn't matter how many requests we fire up in parallel - it's always two threads keeping busy, while the others are sitting idle.
Is there a parameter we can configure to increase the number of threads available to process requests for specific endpoints?
Netty use what is called Non-blocking IO model (http://tutorials.jenkov.com/java-concurrency/single-threaded-concurrency.html).
In this case you have only a single thread and it can handle a lot of sub-processes in parallel, as long as you follow best practices (not blocking the main thread event loop).
You might need to check the following configuration options for Netty https://ktor.io/docs/engines.html#configure-engine
connectionGroupSize = x
workerGroupSize = y
callGroupSize = z
Default values usually are set rather low and tweaking them could be useful for the time-consuming 'work'. The exact values might vary depending on the available resources.
DISCLAIMER: If this post is off-topic to this site, please recommend a site where this post would be appropriate.
On Ubuntu 18.04, in bash, I am writing a network-based, threaded application that requires multiple servers. It receives files through the network and processes them, ultimately making an API call that finishes the processing and logs the results to a database for later retrieval and reporting.
So far I have written the application using non-threaded programming models and concepts. That means the files are processed one at a time in real-time. This works great if there is no sudden burst of files and/or a backlog of files to process. The main bottle neck has been the way I sequentially send files to the API one after another, waiting until the entire operation has taken place for one file and the API returns the results. The API has a rate limit of 8 calls per second. But since each call takes from .75 to 1 second, my program waits until the operation is done and only processes about 1 file per second through the API. In short, I did not have to worry about scheduling API calls because I could barely do one call per second.
Since the capacity is there to process 8 files per second, and I need more speed, I have been converting my single-threaded, sequential application into a parallel, scalable, multi-threaded application. This new version can spawn enough threads to send 8 files per second to the REST API and much more. So now I have the opposite problem. I am sending too many requests per second to the REST API and am in danger of triggering penalties, etc. Ultimately, when my traffic is higher, I will upgrade my subscription to the API and get more calls per second, but this current dilemma has got me thinking about how to schedule the API calls with different threads.
The purpose of this post is to discuss an idea about how to schedule these REST API calls across various threads. Specifically, I want to discuss how to coordinate timing and usage of the API while maintaining efficiency and yet not overloading the API. In short, I want to coordinate a group of threads so that the API is properly used. Not too fast and not too slow.
Independent of my application, this idea could be useful in a number of generically similar scenarios.
My idea is to create an "air traffic controller" ("ATC") so that the threads of the application have a centralized timing authority to check when they are ready to submit files to the REST API. The ATC would know how many time slots/calls per time period (in this case, calls per second) the API can schedule. The ATC would be listening for the threads to request a time slot ("launch code") which would give them a time slot in the future to perform their API call. The ATC would decide based on the schedule of other launch codes that it has already handed out.
In my case, from the start of the upload of the file to the API, it could take 0.75 to 1 second to complete the processing and receive a response from the API. This does not affect the count of new API calls that can be performed. It is just a consideration of how long the threads will be waiting once they call the API. It may not be relevant to this overall discussion.
Each thread would obviously have to do some error handling. If the API timed out or threw an error, then the thread would have to handle it and get back in line with the ATC -if appropriate- and ask for a new launch code. Maybe it should report the error to the ATC for centralized logging?
In situations where the file processing needs burst above 8 files per second, there would be a scheduling backlog where the threads should wait their turn as assigned by the ATC.
Here are some other considerations:
Function
The ATC would be a lightweight daemon that does the following:
- listens on some TCP port
- receives a request
security token (?), thread id, priority
- authenticates the request (?)
- examines schedule
- reserves the next available time slot
- returns the launch code
security token (?), current time, launch timing offset to current time, URL and auth token for the API
- expunged expired launch codes
The ATC would need the following:
- to know what port it is supposed to run on
- to know how many slots per time period it was set to schedule
(e.g. 8 per second)
- to have a super fast read/write access to the schedule (associative array?)
- to know the URL and corresponding auth token for the thread to use
- maybe to know multiple URLs and auth tokens for load balancing
Here are more things to consider:
Security
How could we keep the ATC secure while ensuring high performance?
Network-level security (e.g. firewalls allowing only the IP addresses of the file-processing servers?)
Auth tokens or logins and passwords?
Performance
What would the requirements be for this ATC server? Would this be taxing to a CPU and memory?
Timing
How often would an NTP call be needed? By the ATC server? By the servers which call the API?
Scalability
Being able to provide different URLs and auth tokens would allow the ATC to load balance with different API providers.
Threading of the ATC itself
Would the ATC need to spawn threads to be able to handle each new request?
How does a web server handle requests?
How would the various threads share a common schedule?
In a non-threaded environment, the ATC would possibly keep an associative array in memory to keep performance as high as possible. How would the various threads of the ATC have access to the same schedule?
So here is my question. Does this exist? If not, what are some best practices in trying to build the above?
It seems like a beanstalkd kind of network service except it only provides permission/scheduling and is extremely dependant on timing.
Im new to Jmaeter and an currently trying to get the best use out of it to create an API performance test plan.
Lets take the following scenario.
We have an APi which returns data such as part availability and order details for a range or parts.
I want to analyse the response times of the api under different load patterns.
Lets say we have 5 users.
-Each user sends a series of repeated Requests to the API.
-The request made by each user is unique only to that user.
i.e
User 1 requests parts a,b,c.
User 2 requests parts d,e,f... and so on
-All users are sanding their requests at the same time.
The way I have approached this is to create 5 separate thread groups for each user.
Within each thread group is the specific http request that gets sent by each user.
Each http request is governed by its own loop controller where i have set the number of times for each request to be sent
Since I want all users to be sending their requests at once I have unchecked
“run thread groups consecutively” in the main test plan. at a glance the test plan looks something like this:
test plan view
Since im new to using Jmeter and performance testing i have a few questions regarding my approach:
Is the way I have structured the test plan suitable and maintainable in terms of increasing the number of users that I may wish to test with?
Or would it have been better to have a single thread group with 5 child loop controllers, each containing the user specific request body data?
With my current set up, each thread group uses the default ramp up time of 1 second. I figured this is okay since each thread group represents only one user. However i think this might cause a delay on the start up of each test run. Are there any other potentially better ways to handle this such as using the scheduler or incrementing the ramp up time for each thread group so that they don all start at exactly the same time?
Thanks in advance for any advice
Your approach is correct.
If you want the requests to be in parallel they will have to be in separate Thread Groups. Each Thread Group should model a use-case. In your case, the use-case is a particular mix of requests.
By running the test for sufficiently long time you will not feel the effects of ramp-up time.
First of all your test needs to be realistic, it should represent real users (or user groups) as close as possible. If test does it - it is a good test and vice versa. Something like:
If User1 and User2 represent 2 different group of users (like User1 is authenticated and User2 is not authenticated or User1 is admin and User2 is guest) they should go into different Thread Groups.
It is better to use Thread Group iterations instead of Loop Controllers as some test elements like HTTP Cookie Manager have settings like Clear Cookies each Iteration which don't respect iterations produced by Loop or While Controller, they consider only Thread Group-driven iterations
The only way to guarantee sending requests at the same time is putting them under one Thread Group and using Synchronizing Timer
When it comes to real load test, you should be always gradually adding the load so you could correlate various metrics like response time, throughput, error rate with increased number of virtual users. Same approach should be applied for "ramping-down", you should not be turning off the load at once in order to be able to see how does your application recover after the load. You might want to use some custom Thread Groups available via JMeter Plugins project like:
Stepping Thread Group
Ultimate Thread Group
They provide flexible and convenient way to set the desired load pattern.
I have a requirement: the backend can accept only 20 parallel request at a time. It is shared by many other clients and so it is not dedicated.
I have 100 ready request to be sent to the backend, but according to the requirement only 20 request should reach the backend.
How can I controll number of request send to the backend?
I checked tibco bw administrator and found that only load on start up process can be controlled with max job count properties that is incoming messages.
How would tibco do the controlling for out going requests count? Is there any controlling max job count parameter for this or any external way?
I assume it has to do with your Business Logic. However you may not ant to control the Process's thread creation in this. You may want to be little creative and may want to design two different process.
One to receive the request and Log into DB and other to pick specific 20 whatever jobs and send it to backed.
Moreover, you haven't specify if you want to use SOAP over HTTP or JMS. Over JMS we have more options to control this scenario without introducing 2nd process.
hope it may help.
I wanted to tailor the application I am making which communicates with the quickbooks server and adds things like customers and check expenses and I wanted my application to be as efficient as possible regarding performance. For example, my intention was to have all customer additions (batch process) on one thread and all check expenses or bills (batch process) on another thread which is logically possible as the two procedures don't interfere and are not related to one another.
My question is would such a design approach be permissible by Intuit? I guess my concern is regarding any limitations on communication with their servers.
In the docs site, the following throttling policy is mentioned.
What are the throttling limits based on QB accounts, OAuth client, and RealmId at any given time?
EDIT Following line is not valid anymore. FAQ page is updated.
Apart from an upper limit set that ensures no more than 10 requests in progress at any given time;
EDIT
we have a throttling policy across all IDS apis to permit 500 requests/minute per AuthId and per RealmId. The policy permits 200 requests/minute per AuthId for reports endpoints.
Ref - https://developer.intuit.com/docs/0025_quickbooksapi/0058_faq
So, if you follow the above throttling limit then parallel processing using multiple threads is not an issue.
PN - You can't create multiple name entities ( ex - Vendor, Employee and Customer) using parallel threads. Service puts a lock across these 3 entities to ensure an unique name is getting used while creating a new entity.
Thanks