Geo Redundancy in Azure Service Fabric Applications - azure

I'm trying to come up with a solution for achieving Geo-Redundancy (2+ datacentres) while using Service Fabric reliable Actors/Services to manage state. It insinuates here that geo replication is possible
This may happen when, for example, if you aren’t geo replicated and your entire cluster is in one data center, and the entire data center goes down.
but doesn't explain how to switch it on.
Does anybody know if it's a planned feature for ASF that just hasn't been released yet, or whether it's present but not fully explored yet?
Alternatively does anybody have any recommended approaches for cross DC resilience when the state required to run the app is stored using ASF's StateManager?
thanks,
Alex

Alex,
Apparently the service fabric team is still to crack this problem - more info below. However, you should be able to GeoHA Service Fabric Cluster on Azure by yourself. Here's an example of that:
https://alexandrebrisebois.wordpress.com/2016/05/31/deploy-a-geo-ha-service-fabric-cluster-on-azure/
Not today, but this is a common request that we continue to investigate.
The core Service Fabric clustering technology knows nothing about Azure regions and can be used to combine machines running anywhere in the world, so long as they have network connectivity to each other. However, the Service Fabric cluster resource in Azure is regional, as are the virtual machine scale sets that the cluster is built on. In addition, there is an inherent challenge in delivering strongly consistent data replication between machines spread far apart. We want to ensure that performance is predictable and acceptable before supporting cross-regional clusters. Source: https://learn.microsoft.com/en-us/azure/service-fabric/service-fabric-common-questions
Cheers,
Paulo

There is no reason you cannot install a series of nodes in different regions as part of the same Fabric, and use placement constraints to control service allocation. As long as the nodes can properly communicate with each other, there should be no problem with this.
If you're using Azure, you should deploy them to Virtual Networks, and link them together using VPNs. You could even cross to on-prem.

I believe the answer would be to use a custom replicator implementation and bridging multiple clusters with expressroute.

Related

Service Fabric: Looking for ways to balance load between services or actors inside one application

We're considering using Service Fabric on-premises, fully or partially replacing our old solution built based on NServiceBus, though our knowledge about SF is yet a bit limited. What we like about NServiceBus is the out-of-the-box feature to declaratively throttle any service with the maximum amount of threads. If we have multiple services, and one of them starts hiccuping due to some external factors, we do not want other services affected by that. That "problem" service would just take the maximum amount of threads we allocate it with in its configuration, and its queue would start growing, but other services keep working fine as computer resources are still available. In Service Fabric, if we let our application create as many "problem" actors as it wants, it will lead to uncontrollable growth of the "problem" actors that will consume all server resources.
Any ideas on how with SF we can protect our resources in the situation I described? My first impression is that no such things like queuing or actors throttling mechanism are implemented in Service Fabric, and all must be made manually.
P.S. I think it should not be a rare demand for capability to somehow balance resources between different types of actors inside one application, to make them less dependent on each other in regards to consuming resources. I just can't believe there is nothing offered for that in SF.
Thanks
I am not sure how you would compare NServiceBus (which is a messaging solution) with Service Fabric that is a platform for building microservices. Service Fabric is a platform that supports many different types of workload. So it makes sense it does not provide out of the box throttling of threads etc.
Also, what would you expect from Service Fabric when it comes to actors or services when it comes to resource consumption. It is up to you what you want to do and how to react. I wouldn't want SF to kill my actors or throttle service request automatically. I would expect mechanisms to notify me when it happens and those are available.
That said, SF does have a mechanism to react on load using metrics, See the docs:
Metrics are the resources that your services care about and which are provided by the nodes in the cluster. A metric is anything that you want to manage in order to improve or monitor the performance of your services. For example, you might watch memory consumption to know if your service is overloaded. Another use is to figure out whether the service could move elsewhere where memory is less constrained in order to get better performance.
Things like Memory, Disk, and CPU usage are examples of metrics. These metrics are physical metrics, resources that correspond to physical resources on the node that need to be managed. Metrics can also be (and commonly are) logical metrics. Logical metrics are things like “MyWorkQueueDepth” or "MessagesToProcess" or "TotalRecords". Logical metrics are application-defined and indirectly correspond to some physical resource consumption. Logical metrics are common because it can be hard to measure and report consumption of physical resources on a per-service basis. The complexity of measuring and reporting your own physical metrics is also why Service Fabric provides some default metrics.
You can define you're own custom metrics and have the cluster react on those by moving services to other nodes. Or you could use the Health Reporting system to issue a health event and have your application or outside process act on that.

ThingWorx Horizontal Scalability

What architecture and application development best practices must be followed in order to scale a TWX application?
The majority of applications start with few devices but with time they quickly build up to thousands of devices. Once the amount of traffic is too much for one TWX instance what strategy should be followed?
The same question applies when the front end is overwhelmed by the number of users.
Anytime I have had ThingWorx architecture concerns, I have been redirected to the PTC ThingWorx guide linked below. I do not believe you need a PTC account to view it, but if so it is free.
ThingWorx 8 High Availability Administrators Guide
http://support.ptc.com/WCMS/files/173281/en/ThingWorx_8_High_Availability_Administrators_Guide.pdf
In your case where you have big load concerns, the guide recommends using
two ThingWorx instances to handle the load.
At least two ThingWorx instances are required for HA configuration. A
single instance is started, which becomes leader and fully connects to
the database. Standby servers boot up and can become the leader if
needed, but they do not fully connect to the database or load
information like the leader does. All ThingWorx servers have a service
that is called by the load balancer, which indicates their
availability. Different codes identify the leader, which receives
traffic, and standby nodes, which do not receive traffic but may
become leader.
High-Level Architecture example from the referenced guide:
The Load Balancer determines which ThingWorx instance is to be used by the user. Usually it is used to determine which is available in a redundant architecture (which is what makes it Highly Available). However, it can also be used to determine which to use based on performance. In PTC's HA Admin Guide, they use HAProxy (see page 47) as the Load Balancer. See Section 3.2 of the HAProxy Config Doc for how to configure based on performance.
Hope this helps! It is a pretty open-ended topic
With ThingWorx 9.0 release, the ThingWorx Foundation platform supports true horizontal scalability with an active-active clustering setup providing no single points of failure. The document here provides the details about the install and setup.
There is also a ThingWorx 9.0 deployment architecture guide for an overview of all the architectural details.
ThingWorx High Availability Clustering setup image

Using Hyperledger Fabric in production

I am using HF for some time and trying different things regarding business network specification and configuration.
But, I have couple of question regarding best practices (if there are any yet) in using HF in production.
When we talk about using HF in production, should we use docker-compose-base.yaml, docker-compose-cli.yams, cofigtx.yaml.... etc. as files used to setup and configure our business network, and if not, can you please specify what is the best practice use-case?
Thank you for your answers.
You could use Docker Swarm/Compose with derivatives of the sample compose files you referenced, or you could use Kubernetes to manage a network (or subset of same). Project Cello is working on delivering such capability. The Ansible driver in particular has been demonstrated to work effectively - though it is far from a 1.0 level of maturity.
The reality is that you'll want to manage (likely) more than just four peer nodes all on the same VM or host, but manage multiple peers on multiple VMs/hosts even across multiple networks for a production deployment.
Further, you will obviously need to add management and monitoring to the deployed containers for a true production experience. The Hyperledger chat and mailing lists can be good sources of help and insight.

How are OS configuration changes controlled when using Service Fabric?

When using Azure web/worker roles users can specify osVersion to explicitly set "Guest OS image" version. This ensures that when Microsoft issues new critical updates they are first shown up on a newer "OS image" which users can explicitly specify and test their service on.
How is the same achieved with Azure Service Fabric? Suppose I deployed my service into Azure Service Fabric and it's been running for a month, then Microsoft issues updates for the OS on the server where the service is running - how are they applied such that I can test them first to ensure they don't break the service?
Brett is correct. SF cluster is based on Azure VMSS and the expectation is that the customer is responsible to patch the OS. https://azure.microsoft.com/en-us/documentation/articles/service-fabric-cluster-upgrade/
We have heard from majority of the SF customers that this is not at all desirable and that they do not want to be responsible for OS patching.
The feature to enable an OPT-IN automatic OS patching is indeed a very high priority within Azure Compute team. The exact details on how best to offer this is still in design, however the intent is to have this functionality enabled before the end of the year.
Although that is the right long term solution, to mitigate this issue in the short term, SF team is working on a set of steps that will enable the customers to opt into having the their VMs patched using WU in a safe manner. Once the steps are tested out, we will blog about it and will publish a document detailing the steps. Expect that in the next couple of months.
As I understand it you are currently responsible for managing patching on SF cluster nodes yourself. Apparently moving this to be a SF managed feature is planned but I have no idea how far down the road it might be.
I personally would make this a high priority. Having used Cloud Services for many years I have come to rely on never having to patch my VM's manually. SF is a large backwards step in this particular area.
It'd be great to hear from an Azure PM on this...
Automatic Image based patching like cloud services in service fabric.
Today you do not have that option. The image based patching capability is work in progress. I posted a road map to get there on the team blog : https://blogs.msdn.microsoft.com/azureservicefabric/2017/01/09/os-patching-for-vms-running-service-fabric/ Try out the script and report any issues you hit. Looking forward to your feedback.
Lots of parts of Service Fabric are huge rolling dumpster fires backwards. Whole new hosts of problems have been introduced that the IIS/WAS/WCF team have already solved that need to be developed for once again. The concept of releasing a PAAS platform while requiring OS patch management is laughable. To add insult to injury there is no migration path from "classic cloud PAAS" to this stuff. WEEEE I get to write my very own service host. Something that was provided out of the box for a decade by WAS. Not all of us were scared by the ability to control all aspects of service host communication options via configuration. Now we get to use code so a tweak channel configuration requires a full patch/release cycle!

provisioning hosted solution for SMEs on azure

i intend to build software for SMEs on the azure platform that can be provisioned for different clients..what i mean is, once the client signs up, a new instance is automatically created for them on the azure platform.
Does anyone have any experience with building such solutions or are their any commercial packages like that available?
thanks
It sounds like you're planning to have a single-tenant system, where each instance is slightly different then others and is customized for each client slightly differently. If this is the case, Azure in general will not be a great platform for you. It thrives on providing a dynamic quantity of exactly-alike instances. Furthermore, having one instance per client is a bad idead, as instances are slightly volatile. MS may choose to bring one down for an upgrade, or instance may simply crash, and SLA is only inforced when 2+ instances are running.
I'd like to suggest that you consider multi-tenant environment, where your system shards itself virtually via database/architecture/etc. Do not tie instances to quantity of clients, but to actual load.
Now, if you want to spin up exactly same instances when new clients sign up, check out dynamic scaling service for Azure called AzureWatch # http://www.paraleap.com - its main premise to scale your instances to load, but with a few simple queue/table inserts it can programmatically scale you up or down. Contact me there if you think this will work for you, and ill be glad to explain how this can be done

Resources