Currently I'm investigating possibility to use Azure Service Fabric and its Reliable Services in order to implement my problem domain architecture.
Problem domain: I am currently doing a research on distributed large-scale web crawling architectures involving dozens of parallel agents that should crawl web-servers and download resources for further indexing.
I've found useful academic paper which describes Azure-based distributed web-crawling architecture: Link to .pdf paper and I'm trying to implement and try out prototype based on this design.
So basic high-level look of design is something like this figure below:
The idea: Central Web Crawling System Engine (further - CWCE) runs in an infinite loop until program is aborted and fetches Service Bus Queue Message which contains URL of page to be crawled. CWCE component then checks hostname of this URL and consults Agent Registrar SQL database if alive agent already exists for given hostname. If not, CWCE then does one of the following procedures:
If number of alive agents (A_alive) is equal to Max value (upper bound limit of agents, provided by application administrator) CWCE waits until A_alive < Max value
If A_alive < Max, CWCE tries to create new Agent and assign hostname to it. (agent is then registered in SQL Registrar database).
Each Agent runs on its own partition (URL hostname, for example: example.com) and recursively crawls only pages of this hostname while discovering external hostnames URLs and adding them to Service Bus Queue for other agent processings.
The benefit of this architecture would be horizontal scaling of agents and near-linear workload increase of crawling effectiveness.
However, I am very new in Azure Service Fabric and therefore would like to ask if this PaaS layer is capable of solving this problem? Main questions:
Would it be possible to manually create new web crawling agent instances through the programmable code and pass them hostname parameter using Azure Service Fabric? (Maybe using FabricClient class for manipulating cluster and creating service instances?)
Which ASF programming model fits this parallel long-running agents scenario the best? Stateless services, stateful services or Actor Model? Each agent might run as long-running task, since it recursively crawls specific hostname URLs and listens for the queue.
Would it be possible to control and change this upper bound limit of Max alive agents during runtime of application?
Would it be possible to have infinite-loop stateless service CWCE component which continuously listens for the queue messages in order to spawn up new agents?
I am not sure whether the selected ASF PaaS layer is the best solution for this distributed web-crawling system use-case, so your insights would be so much valuable for me. Any helpful resource links would also be so beneficial.
Service Fabric will allow you to implement the architecture that you want.
Would it be possible to manually create new web crawling agent instances through the programmable code and pass them hostname parameter using Azure Service Fabric? (Maybe using FabricClient class for manipulating cluster and creating service instances?)
Yes. The service you will develop and deploy to Service Fabric will be a ServiceType. Service Types don't actually run, instead, from the ServiceType you can create the actual Services, which are named. A single Service (eg ServiceA), will have a number of Instances, to allow scaling and availability. You can programmatically create and remove services of a given type and pass parameters to them, so every service will know what URL to crawl.
Check an example here.
Which ASF programming model fits this parallel long-running agents scenario the best? Stateless services, stateful services or Actor Model? Each agent might run as long-running task, since it recursively crawls specific hostname URLs and listens for the queue.
I would choose Stateless services, because they will be the most efficient in terms of resource utilization and the easiest to manage (no need to store state and manage state, partitioning and replicas). The only thing you need to consider is that every service will eventually crash and restart, so you need to store the current crawling location in a permanent store, not in memory.
Would it be possible to control and change this upper bound limit of Max alive agents during runtime of application?
Yes. Service Fabric services run in Nodes (Virtual Machines) and in Azure, they are managed by Virtual Machine Scale Sets. You can easily add and remove nodes from the VMSS which Will allow you to adjust the total compute and memory power that you want and the actual number of services is already controlled by you as specified in point 1.
Would it be possible to have infinite-loop stateless service CWCE component which continuously listens for the queue messages in order to spawn up new agents?
Absolutely. Message-driven microservices are very common. It's technically not an infinite loop, but a service with a Bus Communication Listener. I found one here as a reference, but I don't know if it's production ready
Related
I am new to using Service Fabric and am trying to scope out some design options. I have a class library which performs different tasks. Some tasks are resource intensive and long-running (processing messages from queues) and others are short-lived and must be responsive (handling job requests from users). There is a significant amount of cached data so shared processes make sense, and the application is stateless. I want to make sure that long-running tasks don't starve other tasks of resources, but also that the utilisation rate is high.
Is it possible to make one Stateless Service project in my solution (referencing my class library) and deploy multiple named StatelessService instances sharing the same process, using configuration to differentiate the tasks performed by those instances? With or without multiple ServiceTypes (although they seem to be one per project, so I assume this must be one ServiceType)?
If so, is it possible to apply different resource governance rules to those service instances so some resources can be reserved for user-driven tasks? So far I get the impression that this isn't possible when the services share a process.
The default shared process model specifies this:
The preceding section describes the default hosting model provided by
Service Fabric, referred to as the Shared Process model. In this
model, for a given application, only one copy of a given
ServicePackage is activated on a node (which starts all the
CodePackages contained in it). All the replicas of all services of a
given ServiceType are placed in the CodePackage that registers that
ServiceType. In other words, all the replicas of all services on a
node of a given ServiceType share the same process.
You can specify multiple service types and multiple code packages.
ServiceTypes declares what service types are supported by CodePackages
in this manifest. When a service is instantiated against one of these
service types, all code packages declared in this manifest are
activated by running their entry points. The resulting processes are
expected to register the supported service types at run time. Service
types are declared at the manifest level and not the code package
level. So when there are multiple code packages, they are all
activated whenever the system looks for any one of the declared
service types.
Resource governance is configured in the service manifest, not at the instance level.
We are on Azure since 2010 and had a great benefit from a performance and reliability in our application. Azure offers a lot of enterprise-level services and I think that the new "Azure Service Fabric" is great.
What I cannot understand by reading the documentation is the approach on migrating an "old" Cloud Service to the new Service Fabric. Why do we want to migrate? For horizontal scaling and more reliability.
Currently we have a single-instance cloud service, that spins up a lot of subservices. Those subservices are great candidates for microservices. The only problem is that some of these subservices are "runners", i.e. they just cycle on our users database and decide whether an operation (service) has to be run for a particular user or not.
How would you migrate a service like this considering that more than one instance may run this service?
Thanks
First thing to keep in mind is that once a service is started it keeps running, and his lifecycle and uptime is controlled by Service Fabric (ex: it will restart it automatically if it crashes). Second thing to keep in mind is that you will end-up with multiple instances of the service running at the same time (on different nodes), so they will end-up doing the exact same thing on different nodes of your cluster.
Your first reflex could be to have one stateless service kind/instance per runner "subservice" that keeps running and leverage the RunAsync (https://learn.microsoft.com/en-us/azure/service-fabric/service-fabric-reliable-services-advanced-usage). Personally, I wouldn't take that approach, since this could then require some kind of synchronization between services to prevent useless concurrency, since they do the exact same thing independently.
A better approach would be to have your runner services need to run only once in a while when requested by the "main" service acting as an orchestrator, you could have a Queue based approach where the "main" service submit tasks (messages) to be processed by the runners, who are listening concurrently on the same Queue, making sure that maximum one service instance would complete the task.
For the Queue, think Service Bus or Reliable Concurrent Queue (https://learn.microsoft.com/enus/dotnet/api/microsoft.servicefabric.data.collections.preview.ireliableconcurrentqueue-1).
I have four services running on Azure Service Fabric, but two of those 4 services depend on another one, is there a way to make a service initialization wait until another service announces it is ready?
No. There's no ordering to service creation (services can be created at any time, not just during a deployment from your build machine), and what does it even mean for your service to be ready? From our perspective it means the Failover Manager found nodes that the service is able to run on and the code packages have been activated on those nodes. The platform doesn't know what your service code does though. From your perspective it probably means "when it's responding to my requests" otherwise it's not "ready," which can happen at any time during the service's lifetime for any number of reasons:
Service was just deployed and its communication stack hasn't opened an endpoint yet
Service instance/replica moved and its communication stack is spinning back up on a new node
Service partition is in quorum loss and not accepting write operations
etc.
This is an ongoing thing that your services need to be prepared to handle. If two of services can't do any work until they are able to talk to another service, then they need to poll for that service they depend on until it's available through an endpoint on that service that you define.
I'm seeing a definite non-round-robin load-balancing pattern in Azure's load balancer for my cloud role. Most of the requests are going to the 1st instance of the two-instance of my Web-Api worker role setup.
How can I ensure that Azure's LB distributes requests equally?
Note the first screenshot from CloudMonix's dashboard contains CPU Utilization for 1st instance (60-65% sustained average) and 2nd screenshot contains CPU utilization for 2nd instance (2-5% sustained average)
This is consistent across many different times I've looked into this.
Both of the instances are the same, only listen to many http requests and process them.
There actually is a way of configuring the loadBalancerDistribution for a Cloud Service in the .csdef file. The flaw is documentation updates :-(
Please look at this article: https://azure.microsoft.com/en-us/blog/azure-load-balancer-new-distribution-mode/
The value of LoadBalancerDistribution can be sourceIP for 2-tuple affinity, sourceIPProtocol for 3-tuple affinity or none (for no affinity. i.e. 5-tuple)
I'll look in to getting the schema article updated to reflect this.
As for the load distribution - if you have not specifically chosen the 2- or 3-tuple algorithm, you should be running with the 5-tuple.
You can use https://resources.azure.com to look at the current configuration.
I know that CPU is a reflection of load, but the load balancer balances based on network sessions, so please ensure that the CPU load and distribution of network sessions correlate. In your situation I would be surprised if they do not - just a reminder.
Please look at this article to ensure you are not running with keep-alives: Extremely uneven cloud service load-balancing with Azure
I've definitely had the same question in the past, but have noticed that over a sustained period (a few days or more) that the requests are balanced between the instances. From my personal research you cannot configure the load balancing on azure cloud services. Here is a document describing the service definition file, and I would imagine that if it was configurable, it would be in there.
However, you can configure the load balancer more explicitly using Azure Resource Manager.
Service Fabric was just announced at the build conference. I was reading the scarce documentation about it and I have a question.
I'm evaluating Service Fabric for hosting CRUD like microservices that are at the moment built in ASP.NET WebApi.
Is Service Fabric geared towards hosting small pieces of functionality that receive data, process it and return the result, rather than hosting CRUD WebApi types of application?
Service Fabric enables the creation of both stateless and stateful microservices.
As the name suggests, any state maintained by an an instance of a stateless service will be lost if the node goes down. A new, fresh instance will simply be spun up elsewhere in the cluster.
Stateful services offer the ability to persist state without relying on an external store. Any data stored in a Reliable Collection will be automatically replicated across multiple nodes in the cluster, ensuring that the state is resilient to failures.
A common pattern is to use a stateless service as the client-facing gateway to the application and then have that service direct traffic to the app's partitioned stateful services. This hides the work of resolving partitions from clients, allowing them to to target one logical endpoint with all requests.
Take a look at the WordCount sample for an example of how this works. The WordCount.WebService stateless service acts as the front end to the application. It simply resolves the partition based on the incoming request and then sends it on. The WordCount.Service stateful service (partitioned based on the first letter of the word) immediately puts those incoming requests in a ReliableQueue and then processes them in the background, storing the results in a ReliableDictionary.
For more details, see the Reliable Services Overview.
Note: for now, the best way to expose WebAPI endpoints to clients is to self-host an OWIN server in the stateless service. ASP.NET 5 projects will soon be supported as well.
This video answers my own question: http://channel9.msdn.com/Events/Build/2015/2-704. In summary, we should use Stateless Services to host ASP.NET based sites or API's which persist data to external data stores.
If you don't have state (or have it externally), Stateless Service is the way to start.
Answer to the original question is "both". Basically, anything that have main() function (with couple of more extended contract methods to talk to Service Fabric) can be a service in Service Fabric world.