I am exploring the notion of using Hazelcast (or any another caching framework) to advertise services within a cluster. Ideally when a cluster member departs then its services (or objects advertising them) should be removed from the cache.
Is this at all possible?
It is possible for sure.
The question is: which solution do you like.
If the services can be stored in a map, you could create a map with a ttl of e.g. a few minutes and each member needs to refresh its service to prevent the services from expiring.
An alternative solution is to listen to member changes using the membershiplistener and once a member leaves, the services that belong to that member need to be removed from the map.
If you don't like none of this, you could create your own SPI based implementation. The SPI is the lower level infrastructure used by hazelcast to create its distributed datastructures. A lot more work, but also a lot of flexibility.
So there are many solutions.
Related
Background
We are looking at porting a 'monolithic' 3 tier Web app to a microservices architecture. The web app displays listings to a consumer (think Craiglist).
The backend consists of a REST API that calls into a SQL DB and returns JSON for a SPA app to build a UI (there's also a mobile app). Data is written to the SQL DB via background services (ftp + worker roles). There's also some pages that allow writes by the user.
Information required:
I'm trying to figure out how (if at all), Azure Service Fabric would be a good fit for a microservices architecture in my scenario. I know the pros/cons of microservices vs monolith, but i'm trying to figure out the application of various microservice programming models to our current architecture.
Questions
Is Azure Service Fabric a good fit for this? If not, other recommendations? Currently i'm leaning towards a bunch of OWIN-based .NET web sites, split up by area/service, each hosted on their own machine and tied together by an API gateway.
Which Service Fabric programming model would i go for? Stateless services with their own backing DB? I can't see how Stateful or Actor model would help here.
If i went with Stateful services/Actor, how would i go about updating data as part of a maintenance/ad-hoc admin request? Traditionally we would simply login to the DB and update the data, and the API would return the new data - but if it's persisted in-memory/across nodes in a cluster, how would we update it? Would i have to expose this all via methods on the service? Similarly, how would I import my existing SQL data into a stateful service?
For Stateful services/actor model, how can I 'see' the data visually, with an object Explorer/UI. Our data is our Gold, and I'm concerned of the lack of control/visibility of it in the reliable services models
Basically, is there some documentation on the decision path towards which programming model to go for? I could model a "listing" as an Actor, and have millions of those - sure, but i could also have a Stateful service that stores the listing locally, and i could also have a Stateless service that fetches it from the DB. How does one decide as to which is the best approach, for a given use case?
Thanks.
What is it about your current setup that isn't meeting your requirements? What do you hope to gain from a more complex architecture?
Microservices aren't a magic bullet. You mainly get four benefits:
You can scale and distribute pieces of your overall system independently. Service Fabric has very sophisticated tools and advanced capabilities for this.
You can deploy and upgrade pieces of your overall system independently. Service Fabric again has advanced capabilities for this.
You can have a polyglot system - each service can be written in a different language/platform.
You can use conflicting dependencies - each service can have its own set of dependencies, like different framework versions.
All of this comes at a cost and introduces complexity and new ways your system can fail. For example: your fast, compile-time checked in-proc method calls now become slow (by comparison to an in-proc function call) failure-prone network calls. And these are not specific to Service Fabric, btw, this is just what happens you go from in-proc method calls to cross-machine I/O - doesn't matter what platform you use. The decision path here is a pro/con list specific to your application and your requirements.
To answer your Service Fabric questions specifically:
Which programming model do you go for? Start with stateless services with ASP.NET Core. It's going to be the simplest translation of your current architecture that doesn't require mucking around with your data layer.
Stateful has a lot of great uses, but it's not necessarily a replacement for your RDBMS. A good place to start is hot data that can be stored in simple key-value pairs, is accessed frequently and needs to be low-latency (you get local reads!), and doesn't need to be datamined. Some examples include user session state, cache data, a "snapshot" of the most recent items in a data stream (like the most recent stock quote in a stream of stock quotes).
Currently the only way to see or query your data is programmatically directly against the Reliable Collection APIs. There is no viewer or "management studio" tool. You have to write (and secure) an API in each service that can display and query data.
Finally, the actor model is a very niche model. It serves specific purposes but if you just treat it as a data store it will not work for you. Like in your example, a listing per actor probably wouldn't work because you can't query across that list, or even have multiple users reading the same listing simultaneously.
I'm currently evaluating using Hazelcast for our software. Would be glad if you could help me elucidate the following.
I have one specific requirement: I want to be able to configure distributed objects (say maps, queues, etc.) dynamically. That is, I can't have all the configuration data at hand when I start the cluster. I want to be able to initialise (and dispose) services on-demand, and their configuration possibly to change in-between.
The version I'm evaluating is 3.6.2.
The documentation I have available (Reference Manual, Deployment Guide, as well as the "Mastering Hazelcast" e-book) are very skimpy on details w.r.t. this subject, and even partially contradicting.
So, to clarify an intended usage: I want to start the cluster; then, at some point, create, say, a distributed map structure, use it across the nodes; then dispose it and use a map with a different configuration (say, number of backups, eviction policy) for the same purposes.
The documentation mentions, and this is to be expected, that bad things will happen if nodes have different configurations for the same distributed object. That makes perfect sense and is fine; I can ensure that the configs will be consistent.
Looking at the code, it would seem to be possible to do what I intend: when creating a distributed object, if it doesn't already have a proxy, the HazelcastInstance will go look at its Config to create a new one and store it in its local list of proxies. When that object is destroyed, its proxy is removed from the list. On the next invocation, it would go reload from the Config. Furthermore, that config is writeable, so if it has been changed in-between, it should pick up those changes.
So this would seem like it should work, but given how silent the documentation is on the matter, I'd like some confirmation.
Is there any reason why the above shouldn't work?
If it should work, is there any reason not to do the above? For instance, are there plans to change the code in future releases in a way that would prevent this from working?
If so, is there any alternative?
Changing the configuration on the fly on an already created Distributed object is not possible with the current version though there is a plan to add this feature in future release. Once created the map configs would stay at node level not at cluster level.
As long as you are creating the Distributed map fresh from the config, using it and destroying it, your approach should work without any issues.
I'm trying to come up with a solution for achieving Geo-Redundancy (2+ datacentres) while using Service Fabric reliable Actors/Services to manage state. It insinuates here that geo replication is possible
This may happen when, for example, if you aren’t geo replicated and your entire cluster is in one data center, and the entire data center goes down.
but doesn't explain how to switch it on.
Does anybody know if it's a planned feature for ASF that just hasn't been released yet, or whether it's present but not fully explored yet?
Alternatively does anybody have any recommended approaches for cross DC resilience when the state required to run the app is stored using ASF's StateManager?
thanks,
Alex
Alex,
Apparently the service fabric team is still to crack this problem - more info below. However, you should be able to GeoHA Service Fabric Cluster on Azure by yourself. Here's an example of that:
https://alexandrebrisebois.wordpress.com/2016/05/31/deploy-a-geo-ha-service-fabric-cluster-on-azure/
Not today, but this is a common request that we continue to investigate.
The core Service Fabric clustering technology knows nothing about Azure regions and can be used to combine machines running anywhere in the world, so long as they have network connectivity to each other. However, the Service Fabric cluster resource in Azure is regional, as are the virtual machine scale sets that the cluster is built on. In addition, there is an inherent challenge in delivering strongly consistent data replication between machines spread far apart. We want to ensure that performance is predictable and acceptable before supporting cross-regional clusters. Source: https://learn.microsoft.com/en-us/azure/service-fabric/service-fabric-common-questions
Cheers,
Paulo
There is no reason you cannot install a series of nodes in different regions as part of the same Fabric, and use placement constraints to control service allocation. As long as the nodes can properly communicate with each other, there should be no problem with this.
If you're using Azure, you should deploy them to Virtual Networks, and link them together using VPNs. You could even cross to on-prem.
I believe the answer would be to use a custom replicator implementation and bridging multiple clusters with expressroute.
While discovering SF Reliable Services I want to make sure that next basic statements are true.
Reliable Services Default Communication stack (DefaultStack) and Reliable Actors Communication stack (using ServiceProxy/ActorProxy) can only be used for communicating inside SF Cluster. Customers from outside must use WebAPI/WCF stacks.
ServicePartitionResolver, CommunicationClientFactory, ServicePartitionClient are stuff that already implemented inside DefaultStack. I don't have to worry about it if I use only DefaultStack.
Some Stateful service has more then one partition, and I want for example to post an item to process it. It is not SF's responsibility to decide what exactly partition should be used by posting customer. I need manually implement an algorithm resolving partition key or name and use it in ServiceProxy constructor (for DefaultStack).
You're correct on all those points,
If you want to communicate outside Service Fabric you need to use something like an OwinCommunicationListener (see here).
You’d only have to implement those if you wanted to plug in your own communication stack.
Yep, you’d need to define the partition key when you’re creating a ServiceProxy.
The Azure Service Fabric appears to be focused on scenarios in which all data can fit within RAM and persistence is used as a backing store. Reliable Services are designed to store information in Reliable Collections, which use a log-checkpoint system where logged information is written into RAM. Meanwhile, for Reliable Actors, the default actor state provider is "the distributed Key-Value store provided by Service Fabric platform." This seems to indicate that the same limitations would apply.
There may, however, be situations in which one would like to use the Service Fabric for "hot data" but write "cold data" to some form of permanent storage. What are best practices for handling this transition?
In Orleans, this seems to be handled automatically, using a persistence store such as Azure tables. But it seems that a principal design purpose of the Service Fabric and the Reliable Collections are to avoid needing external services, thus enhancing data locality. The current documentation anticipates the possibility that one would want to move data into some permanent store for disaster recovery and analytics, but it does not discuss the possibility of moving data back and forth between persistence-backed in-memory actors and more permanent forms of storage.
A possible answer is that the Service Fabric already does this. Maybe a Reliable Dictionary has some built-in mechanism for switching between persistence-backed in-memory storage and permanent storage.
Or, maybe the answer is that one must manage this oneself. One approach might be for an Actor to keep track of how "hot" it is and switch its persistence store as necessary. But this sacrifices one of the benefits of the Actor model, the automatic allocation and deallocation of actors. Similarly, we might periodically remove items from the Reliable Dictionary and add it to some other persistence store, and then add them back. Again, though, this requires knowledge of when it makes sense to make the transition.
A couple of examples may help crystallize this:
(1) Suppose that we are implementing a multiplayer game with many different "rooms." We don't need all the rooms in memory at once, but we need to move them into memory and use local persistence as a backup once players join them.
(2) Suppose that we are implementing an append-only B-Tree as part of a database. The temptation would be to have each B-Tree node be a stateful actor. We would like hot b-trees to remain in memory but of course the entire index can't be in memory. It seems that this is a core scenario that is already implemented for things like DocumentDB, but it's not clear to me from the documentation how one would do this.
A related question that I found is here. But that question focuses on when to use Azure Service Fabric vs. external services. My question is on whether there is a need to transition between them, or whether Azure Service Fabric already has all the capability needed here.
The Key-Value store state provider does not require everything to be kept in memory. This provider actually stores the state of all actors on the local disk and the state is also replicated to the local disk on other nodes. So the KVS store is considered a persistent and reliable store.
In addition to that, the state of active actors is also stored in memory. When an actor hasn't been used in a while, it gets deactivated and garbage collected. When this happens, the in-memory copy is freed and only the copy on disk remains. When the actor is activated again, the state is fetched from disk and remains in memory as long as the actor is active.
Also, KVS is not the only built-in state provider. We also have the VolatileActorStateProvider (http://azure.microsoft.com/en-gb/documentation/articles/service-fabric-reliable-actors-platform/#actor-state-provider-choices). This is the state provider that keeps everything in memory.
The KvsActorStateProvider does indeed store actor state in a KeyValueStore which is a similar structure to the ReliableDictionary.
The first question I'd ask is whether you need to relegate old actors state to cold storage? The limitation of keeping everything in memory doesn't limit you to a total number of actors, but a total number per replica. So you must first consider the partitioning strategy so that your actors are distributed across a number of different replicas. As your demands grow you can then add more machines to the cluster and the ServiceFabric will orchestrate movements of the replicas to the new machines. For more information on partitioning of the Actor service, see http://azure.microsoft.com/en-gb/documentation/articles/service-fabric-reliable-actors-platform/
If you do want to use cold storage after some time, then you have a couple of options. Firstly, you could decorate your actors with a custom ActorStateProviderAttribute that returns your own implementation of an IActorStateProvider that can handle persistence as you decide.
Alternatively, you could handle it entirely within your Actor implementation. Hook into the Actor Lifecycle and in OnDeactivateAsync such that when the instance is garbage collected, or use an Actor Reminder for some specified time in the future, to serialise the state and store in cold storage such as blob or table storage and null out the State property. The ActivateAsync override can then be used to retrieve this state from offline storage and deserialise.