I was looking for configure the thread pool by application instead of profile. Basically, to avoid the unavailability of threads for most important application that share the profile with application that can run in background.
Is there any configuration thread pools by application instead of profile?
The best would be to use a dedicated thread pool for the critical application.
However in order to achieve that you need to use a separate host-port combination for that particular application.
That is because a web threads pool is associated with a web container transport chain and a transport chain is determined by a host-port pair.
Related
Currently I'm investigating possibility to use Azure Service Fabric and its Reliable Services in order to implement my problem domain architecture.
Problem domain: I am currently doing a research on distributed large-scale web crawling architectures involving dozens of parallel agents that should crawl web-servers and download resources for further indexing.
I've found useful academic paper which describes Azure-based distributed web-crawling architecture: Link to .pdf paper and I'm trying to implement and try out prototype based on this design.
So basic high-level look of design is something like this figure below:
The idea: Central Web Crawling System Engine (further - CWCE) runs in an infinite loop until program is aborted and fetches Service Bus Queue Message which contains URL of page to be crawled. CWCE component then checks hostname of this URL and consults Agent Registrar SQL database if alive agent already exists for given hostname. If not, CWCE then does one of the following procedures:
If number of alive agents (A_alive) is equal to Max value (upper bound limit of agents, provided by application administrator) CWCE waits until A_alive < Max value
If A_alive < Max, CWCE tries to create new Agent and assign hostname to it. (agent is then registered in SQL Registrar database).
Each Agent runs on its own partition (URL hostname, for example: example.com) and recursively crawls only pages of this hostname while discovering external hostnames URLs and adding them to Service Bus Queue for other agent processings.
The benefit of this architecture would be horizontal scaling of agents and near-linear workload increase of crawling effectiveness.
However, I am very new in Azure Service Fabric and therefore would like to ask if this PaaS layer is capable of solving this problem? Main questions:
Would it be possible to manually create new web crawling agent instances through the programmable code and pass them hostname parameter using Azure Service Fabric? (Maybe using FabricClient class for manipulating cluster and creating service instances?)
Which ASF programming model fits this parallel long-running agents scenario the best? Stateless services, stateful services or Actor Model? Each agent might run as long-running task, since it recursively crawls specific hostname URLs and listens for the queue.
Would it be possible to control and change this upper bound limit of Max alive agents during runtime of application?
Would it be possible to have infinite-loop stateless service CWCE component which continuously listens for the queue messages in order to spawn up new agents?
I am not sure whether the selected ASF PaaS layer is the best solution for this distributed web-crawling system use-case, so your insights would be so much valuable for me. Any helpful resource links would also be so beneficial.
Service Fabric will allow you to implement the architecture that you want.
Would it be possible to manually create new web crawling agent instances through the programmable code and pass them hostname parameter using Azure Service Fabric? (Maybe using FabricClient class for manipulating cluster and creating service instances?)
Yes. The service you will develop and deploy to Service Fabric will be a ServiceType. Service Types don't actually run, instead, from the ServiceType you can create the actual Services, which are named. A single Service (eg ServiceA), will have a number of Instances, to allow scaling and availability. You can programmatically create and remove services of a given type and pass parameters to them, so every service will know what URL to crawl.
Check an example here.
Which ASF programming model fits this parallel long-running agents scenario the best? Stateless services, stateful services or Actor Model? Each agent might run as long-running task, since it recursively crawls specific hostname URLs and listens for the queue.
I would choose Stateless services, because they will be the most efficient in terms of resource utilization and the easiest to manage (no need to store state and manage state, partitioning and replicas). The only thing you need to consider is that every service will eventually crash and restart, so you need to store the current crawling location in a permanent store, not in memory.
Would it be possible to control and change this upper bound limit of Max alive agents during runtime of application?
Yes. Service Fabric services run in Nodes (Virtual Machines) and in Azure, they are managed by Virtual Machine Scale Sets. You can easily add and remove nodes from the VMSS which Will allow you to adjust the total compute and memory power that you want and the actual number of services is already controlled by you as specified in point 1.
Would it be possible to have infinite-loop stateless service CWCE component which continuously listens for the queue messages in order to spawn up new agents?
Absolutely. Message-driven microservices are very common. It's technically not an infinite loop, but a service with a Bus Communication Listener. I found one here as a reference, but I don't know if it's production ready
I have a set of user-specific stateful services servicing requests forwarded from a public-facing stateless service (web API) in an app.
I'm trying to delete a stateful service if it has not serviced any user request since a given time interval, say an hour. Currently, I'm managing this by keeping a .NET timer in the service itself and using the tick event to self-destruct the service if it's been idle.
Is this the right way to do it? Or is there any other more efficient approach to do this in Azure service fabric?
The mechanism you have will work great and is what we'd normally recommend.
Another way to do it would be to have a general "service manager" service that periodically checked to see if services were busy, (or were informed) and which could kick off the deleteserviceasync call. That way only that service would need the cluster admin rights, while all the others could get locked down to read only.
I am configuring a web garden in IIS server.
I know that the web garden will have more than once worker process for a application pool, which mean that extra w3wp.exe can get created for one web application once it's number of requests exceeds the limit. (Please correct me if i am wrong)
From the above case , is there a possibility for one request is being processed by more than one w3wp.exe ?
and also need clarity on whether two w3wp.exe will be created for single application? or two w3wp.exe will share the applications in an app pool.
From the above case , is there a possibility for one request is being processed by more than one w3wp.exe ?
No, each request is assigned to one and only one worker process.
whether two w3wp.exe will be created for single application? or two w3wp.exe will share the applications in an app pool.
Two (or more) worker processes will be created for the application pool. Inside the pool you can have multiple applications or App-Domains in Dot.net terms. If you have multiple applications in the pool, requests for each application are distributed among the processes. So with two applications and two W3wps you can't tell which w3sp processes the request for a specific application.
From what I understand both run small repeatable tasks in the cloud.
What reasons and in what situations might I want to choose one over the other?
Some basic information:
WebJobs are good for lightweight work items that don't need any customization of the environment they run in and don't consume very much resources. They are also really good for tasks that only need to be run periodically, scheduled, or triggered. They are cheap and easy to setup/run. They run in the context of your Website which means you get the same environment that your Website runs in, and any resources they use are resources that your Website can't use.
Worker Roles are good for more resource intensive workloads or if you need to modify the environment where they are running (ie. a particular .NET framework version or something installed into the OS). Worker Roles are more expensive and slightly more difficult to setup and run, but they offer significantly more power.
In general I would start with WebJobs and then move to Worker Roles if you find that your workload requires more than WebJobs can offer.
If we are to measure "power" as computational power, then in a virtual environment, this translates to how many layers are on top of the physical machine (the metal). The user code on a virtual machine runs on top of a hypervisor, which manages the physical machine. This is the thickest layer. Whenever possible the hypervisor tries to simply serve as a pass-through to the metal.
There is fundamentally little overhead for WebJobs. It is sandboxed, OS is maintained, and there are services & modules to make sure it runs. But the application code is essentially as close to the metal as in Worker Roles, since they use the same hypervisor.
If what you want to measure is "flexibility", then use Worker Roles, since it is not managed or sandboxed, it is more flexible. You are able to use more sockets, define your own environment, install more packages, etc.
If what you want is "features", then WebJobs has a full array of features. Including, virtual-networking to on-prem resources, staging environments, remote debugging, triggering, scheduling, easy connection to storage and service bus, etc...
Most people want to focus on solving their problem, and not invest time in infrastructure. For that, you use WebJobs. If you do find that you need more flexibility, or the security sandbox is preventing you from doing something that can't be accomplished any other way, then move to Worker Roles.
It is even possible to build hybrid solutions where some parts are done in WebJobs and others are done in Worker Roles, but that's out of the scope of this question. (hint: WebJobs SDK)
Somethings to remember when choosing to use a Web Job or a Worker Role:
A Worker Role is self hosted on a dedicated VM, a Web Job is hosted in a Web App container.
A Worker Role will scale independently, a Web Job will scale along with the Web App container.
Web Jobs are perfect for polling RSS feeds, checking for and processing messages and for sending notifications, they are lightweight and cheaper than Worker Roles but are less powerful.
I am using Visual Studio 2010 to develop Azure applications. I want to start a worker role inside of another worker role. Is this possible? Just like threads, I want to create another worker instance while the application is running. Can anyone help me? I am new to Azure platform and C#
I think your level of abstraction may be a bit off. Think of a worker role as a physical machine, not something like a windows service.
Once it's running you can do anything you would on a standard server so instead of "like threading", just do threading. (Personally I recommend using the .NET4 Task Parallel Library, it's awesome ;) )
The short answer is no you can't create a worker role inside another worker role.
You can use a worker role to control other worker roles through the management API (including deploying a service) which may be what you're trying to do.
It is possible that you can achieve what you're trying to do with threads without having to create a whole separate worker role role instance. Could you give us a little more information about what you're trying to do?
Why don't you create a webrole that acts as a master and gives work to worker instances. Workers then run the program and send the output back to the webrole. Web role can then do the needful. The webrole and workerrole can talk through queues.