Based on iis architecture, request from client hitting IIS will pass through httppipeline, specifically through each httpmodule and finally reaches respective httphandlers and then to worker process. Is this happening serially, one after the other?
Say if 10,000 requests hits the webserver concurrently in a sec, is each request get processed one by one? If the webserver has multi-core CPU and high memory capacity, does this helps IIS to handle the requests simultaneously?
Is there any webserver capable to handle requests in parallel?
I just replied to this guys question - very similar, and the answer is the same:
IIS and HTTP pipelining, processing requests in parallel
Related
Let us say, a gRPC client makes two requests R1 and R2 to gRPC server, one after the other (assume without any significant time gap, i.e R2 is made when R1 is still not served). Also, assume that R1 takes much more time than R2.
In this case, should I expect R2's response first as it takes less time or should I expect R1's response first as this request is made prior to R2? What will happen and why?
As far as what I have observed, I think requests are served in FCFS fashion, so, R1's response will be received by the client first and then R2's, but I am not sure.
Theoretically nothing discourages server and client process gRPC requests in parallel. GRPC connection is made over HTTP/2 one that can handle multiple requests at once. So yes - if server doesn't use some specific synchronization or limitation mechanisms then requests would be processes with overlapping. If server resources or policy doesn't allow it then they should be processed one by one. Also I can add than request can have a Timeout after which it would be cancelled. So long wait can lead to cancellation and non-processing at all.
All requests should be processed in parallel. The gRPC architecture for the Java implementation for example, it is divided into 2 "parts":
The event loop runs in a thread work group - It is similar to what we have to reactive implementations. One thread per core to handle the incoming requests.
The request processing is done in a dedicated thread which will be created using the CachedThreadPool system by default.
For single-thread languages like Javascript, I am not sure how they are doing it, but I would guess it is done in the same thread and therefore it would end up queuing the requests.
Does anyone know how many requests a basic single instance of node http server can handle per second without queueing any requests?
Actually, i need to write a nodejs app, that should be able to respond to about thousand incoming requests within 100ms consistently. I'm trying to test it in a 4 cpus server and running 4 instances in cluster mode. but currently it only able to handle less than 1000 requests per second consistently within 100ms
See this answer:
How many clients can an http-server can handle?
It's not about queuing but about how many concurrent requests Node can handle at the same time. It contains example benchmarks that you can use and tweak to your needs.
Now, about queuing: in a sense, every request is queued because only one thing can run at the same time in one Node process. So you can say that the answer to how many requests a Node http server can handle without queuing any requests is: one. Just like for Nginx for example.
Now, the time to respond is an entirely different matter and depends on many factors like what you actually do in those requests handlers.
For example a mean time per request that I got in my experiments here for 10000 concurrent connections was less than 2 milliseconds. If it does more then it can take more of course but there is no one number for all Node servers. It depends how efficient is your implementation.
Now, a big no-no for handling concurrency in Node are:
Never use any "Sync" function
Never use long running for and while loops
Never do any CPU-intensive work in your process that handles the reqests
What will happen:
If I write a server application backed with a thread pool of millions of threads and it gets millions of requests per second
I have worked on developing web services. The web service was deployed on 1000's of computers with a front end load balancer. The load balancer's job was to distribute the traffic amongst the servers that actually process the web requests.
So my question is that since the process running inside load balancer itself HAS to be single threaded to listen to web requests on a port, how does it handle accepting millions of requests per second. the load balancer might be busy delegating a task, then what happens to the incoming request at that instance of time?
In my opinion, all clients will not be handled since there will only be single request handler thread to pass on the incoming request to the thread pool
This way no multi threaded server should ever work.
I wonder how does facebook/amazon handles millions of requests per second.
You are right, it won't work. There is a limit to how much a single computer can process, which is nothing to do with how many threads it is running.
The way Amazon and Facebook etc handle it is to have hundreds or thousands of servers spread throughout the world and then they pass the requests out to those various servers. This is a massive subject though so if you want to know more I suggest you read up on distributed computing and come back if you have specific questions.
With the edit, the question makes much more sense. It is not hard to distribute millions of requests per second. A distribution operation should take somewhat in the viscinity of tens of nanoseconds and would merely consist of pushing the received socket into the queue. No biggie.
As soon as it's done, balancer is ready to accept the next request.
I have a few pages/soap calls on my website, which interact with a lot of other systems remotely to fetch data, process it, make a few synchronous db calls and then send the response back. These calls typically take beyond 10 seconds under load and tend to choke up the requests at the IIS.
Is it possible in such a scenario to selectively kill requests or keep them in wait(push them further down the queue under peak loads)? If yes is it possible to do it not via InetManager(as in write code which accesses the worker processes and sorts the requests in the queue in preferred order or some other way)?
Update: Killing a request is anyways out of question(even if it comes down to that), as it is going to recycle the app-pool.
What is the big deal with HTTP.SYS in IIS 7?
From what I understand, it is low level which is good for security. Why?
There is no context switching which could be expensive. Why?
Please explain.
Thanks!
The benefits are already well documented,
http://www.microsoft.com/technet/prodtechnol/WindowsServer2003/Library/IIS/a2a45c42-38bc-464c-a097-d7a202092a54.mspx?mfr=true
By using HTTP.sys to process requests, IIS 6.0 delivers the following
performance enhancements:
Kernel-mode caching. Requests for cached responses are served without switching to user mode.
Kernel-mode request queuing. Requests cause less overhead in context switching, because the kernel forwards requests directly to the correct worker process. If no worker process is available to accept a request, the kernel-mode request queue holds the request until a worker process picks it up.
Using HTTP.sys and the new WWW service architecture provides the following benefits:
When a worker process fails, service is not interrupted; the failure is undetectable by the user because the kernel queues the requests while the WWW service starts a new worker process for that application pool.
Requests are processed faster because they are routed directly from the kernel to the appropriate user-mode worker process instead of being routed between two user-mode processes.
http://learn.iis.net/page.aspx/101/introduction-to-iis-7-architecture/
HTTP.sys provides the following benefits:
Kernel-mode caching. Requests for cached responses are served without switching to user mode.
Kernel-mode request queuing. Requests cause less overhead in context switching because the kernel forwards requests directly to the correct worker process. If no worker process is available to accept a request, the kernel-mode request queue holds the request until a worker process picks it up.
Request pre-processing and security filtering.