I am working on a project in which I will be calling Service Fabric methods and returning the data to end user. The Data is modified very infrequently or is almost constant so I want to maintain a cache and return it if the data is not modified.
The project structure is: WepApi(Stateless Service) -> Repository -> SatefulService
What is the best way of implementing this in Azure Service Fabric? I am thinking of two options:
Redis cache
a. Creating a Redis cache project where it will expose two endpoints for storing and getting cache data. This project will be referenced in the repository layer.
b. Creating a Redis cache service( service fabric ) and calling from the repository.
stateful service
a. Creating a separate dictionary in the existing stateful service and use it for getting and storing data.
And, I am also having below questions.
Approach #1:
We have to depend on 3rd party system(Redis cache) and we might not get accurate results if the server is not available.
Approach #2:
We might get a performance issue if the cache data is increased over time.
Any best approaches to implement a cache in service fabric?
Thanks,
Reliable Collections were designed for performance, bc they run in-process and data is kept in memory if there is enough available memory (which in your case should be ok, for ten thousand records). The only slow-down compared to a regular dictionary in memory is that a reliable dictionary must maintain transactional consistency while reading, but i presume you need this consistency anyway?
Related
I'm currently looking to securely replicate hundreds of Gbs of data across a few hundred hosts. I was looking at hyperledger-fabric private blockchain because of its use of TLS and peer to peer gossip protocol for data transmission, plus of course the security of the blockchain itself.
Is it reasonable for me to be considering using blockchain as a way to securely do data replication? I have not seen this in any blockchain use case, but from what I've read it seems reasonable even though everything I've read seems to indicate storing data in the blockchain is a bad idea. Usually the arguments are that it costs too much and the data has to be replicated across all the peers in the system. Cost isn't a concern in this case because its a private blockchain and for my use case the data replication (if it can be done efficiently) is what I'm looking for.
I could use ipfs, swift, S3, etc. to store the data, but that would add operational burden, especially if hyperledger-fabric can do the job on its own.
Also, if I use hyperledger private data collections, how much control over purging do I have? For my use cases, I can't just purge the oldest data as in some cases older data needs to be preserved for a long time and in some cases newer data can be purged fairly quickly.
On the subject of data replication:
TL;DR; Not a blockchain solution
Here's my thinking behind that.
Storing large amounts of data isn't a good idea as you've mentioned. Yes there's the replication side of the data across. (but that's a side-effect needed in this case). But also there's the signing and validation etc that nees to take place across all that data. So the costs in terms of processing would mean it would inefficient.
Definition of securely.. You don't say what quality of service would constitute 'secure'. For example
Access Control for users to access the data?
Assurance that the data has been replicated and is on disk at remote locations without corruption?
Encryption of data to protect it in transit and at rest.
Blockchain, and I'm thinking Hyperledger Fabric here, would offer you the assurance. But there's no encryption in transit, you'd need to add that. And access control, the primitives are there but required you to implement and use them.
I would tend to think of the use of Blockchain in this scenario would be to provide the audit trail of how the data was replicated between hosts, with some other protocol.
On the subject of private data collection purging:
Currently this is implemented by purging data when the peer reaches a certain block height. i.e. purge after 42 blocks. But we're working on a feature to allow 'purge-on-demand' based on a call from the chaincode.
Documentation states that Azure Durable Function orchestrations code should be deterministic, cos of replays. In my case, I have some data in Azure Table Storage, that I need to fetch in workflow. The workflow is recursive and the data in Azure Table Storage can change during execution, and it is OK to have stale state for ~1 min. In regular code I would rely on memory cache to improve the performance. But in orchestrations, suppose it can not be used directly, cos this makes workflow non-deterministic.
I can still use cache in activity and call it from orchestrations, but every activity call involves serialization\deserialization of inputs\outputs and passing messages though control queue. These operations are heavier then fetching data itself.
So I have a question, is there any pattern, that can be used to cache data between orchestration instances in memory, without wrapping this logic in activity?
What I can suggest you is: use a distributed cache, specifically Redis Cache for Azure.
I drew an image for you:
Get your data from Azure Table Storage in your orchestration, do your operation in there and save it to Redis cache. Then pass the id of the required data to each activity. Then you can get the data from Redis cache inside each activity.
This is a solution with cache as you asked. However, please note that if you want high-performance data query, Azure Table Storage is not the best solution to work with. I suggest you to use either Azure SQL or CosmosDB. But if you are seeking a cheap option that's fine. But in that case, Redis cache won't be good option for you, because it's not a cheap solution neither. If this Redis cache won't work for you, I would suggest you review your algorithm.
Good luck!
You can store data between orchestrations with entity functions.
https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-entities
And be able to 64 operations per second.
https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-perf-and-scale#performance-targets
Aim: Pretend, I have a very popular page (let's say 1 million people per 5 minute) on my Azure Service Fabric based web application. I want to make some kind of cache layer between a data layer and frontend API layer.
Solution: For this purpose, I choose a Reliable Actor performing only one method for readonly operation: GetFrequentlyAskedPage(). This Actor has a volatile type and 5 minutes timeout to be replaced with Garbage Collector.
Questions:
How many read-operations can be handled by the Actor before it lay down?
Should I use in this case "read from secondary replicas" option for that Actor?
Or maybe I am totally wrong in my reasoning and should change the way of implementation.
I would not recommend using actors as a cache. Actor instances force single-threaded turn-based access, meaning an actor instance can only service one request at a time. This obviously will not perform well as a cache. See here for more info: https://azure.microsoft.com/en-us/documentation/articles/service-fabric-reliable-actors-introduction/
Instead I would recommend using a stateful Reliable Service with a Reliable Dictionary to cache data, or better yet, use a stateful Reliable Service as your data layer, in which case you don't need this cache at all.
The Azure Service Fabric appears to be focused on scenarios in which all data can fit within RAM and persistence is used as a backing store. Reliable Services are designed to store information in Reliable Collections, which use a log-checkpoint system where logged information is written into RAM. Meanwhile, for Reliable Actors, the default actor state provider is "the distributed Key-Value store provided by Service Fabric platform." This seems to indicate that the same limitations would apply.
There may, however, be situations in which one would like to use the Service Fabric for "hot data" but write "cold data" to some form of permanent storage. What are best practices for handling this transition?
In Orleans, this seems to be handled automatically, using a persistence store such as Azure tables. But it seems that a principal design purpose of the Service Fabric and the Reliable Collections are to avoid needing external services, thus enhancing data locality. The current documentation anticipates the possibility that one would want to move data into some permanent store for disaster recovery and analytics, but it does not discuss the possibility of moving data back and forth between persistence-backed in-memory actors and more permanent forms of storage.
A possible answer is that the Service Fabric already does this. Maybe a Reliable Dictionary has some built-in mechanism for switching between persistence-backed in-memory storage and permanent storage.
Or, maybe the answer is that one must manage this oneself. One approach might be for an Actor to keep track of how "hot" it is and switch its persistence store as necessary. But this sacrifices one of the benefits of the Actor model, the automatic allocation and deallocation of actors. Similarly, we might periodically remove items from the Reliable Dictionary and add it to some other persistence store, and then add them back. Again, though, this requires knowledge of when it makes sense to make the transition.
A couple of examples may help crystallize this:
(1) Suppose that we are implementing a multiplayer game with many different "rooms." We don't need all the rooms in memory at once, but we need to move them into memory and use local persistence as a backup once players join them.
(2) Suppose that we are implementing an append-only B-Tree as part of a database. The temptation would be to have each B-Tree node be a stateful actor. We would like hot b-trees to remain in memory but of course the entire index can't be in memory. It seems that this is a core scenario that is already implemented for things like DocumentDB, but it's not clear to me from the documentation how one would do this.
A related question that I found is here. But that question focuses on when to use Azure Service Fabric vs. external services. My question is on whether there is a need to transition between them, or whether Azure Service Fabric already has all the capability needed here.
The Key-Value store state provider does not require everything to be kept in memory. This provider actually stores the state of all actors on the local disk and the state is also replicated to the local disk on other nodes. So the KVS store is considered a persistent and reliable store.
In addition to that, the state of active actors is also stored in memory. When an actor hasn't been used in a while, it gets deactivated and garbage collected. When this happens, the in-memory copy is freed and only the copy on disk remains. When the actor is activated again, the state is fetched from disk and remains in memory as long as the actor is active.
Also, KVS is not the only built-in state provider. We also have the VolatileActorStateProvider (http://azure.microsoft.com/en-gb/documentation/articles/service-fabric-reliable-actors-platform/#actor-state-provider-choices). This is the state provider that keeps everything in memory.
The KvsActorStateProvider does indeed store actor state in a KeyValueStore which is a similar structure to the ReliableDictionary.
The first question I'd ask is whether you need to relegate old actors state to cold storage? The limitation of keeping everything in memory doesn't limit you to a total number of actors, but a total number per replica. So you must first consider the partitioning strategy so that your actors are distributed across a number of different replicas. As your demands grow you can then add more machines to the cluster and the ServiceFabric will orchestrate movements of the replicas to the new machines. For more information on partitioning of the Actor service, see http://azure.microsoft.com/en-gb/documentation/articles/service-fabric-reliable-actors-platform/
If you do want to use cold storage after some time, then you have a couple of options. Firstly, you could decorate your actors with a custom ActorStateProviderAttribute that returns your own implementation of an IActorStateProvider that can handle persistence as you decide.
Alternatively, you could handle it entirely within your Actor implementation. Hook into the Actor Lifecycle and in OnDeactivateAsync such that when the instance is garbage collected, or use an Actor Reminder for some specified time in the future, to serialise the state and store in cold storage such as blob or table storage and null out the State property. The ActivateAsync override can then be used to retrieve this state from offline storage and deserialise.
I'm working on Integration project where third party will call our web service in Azure. For performance reason I would like to store 2 table data (more than 1000 records) on to the app fabric cache.
Could anyone please suggest if this is the right design pattern?
Depending on how much data this is (you don't mention how wide the tables are) you have a couple of options
You could certainly store it in the azure cache, this will cost though.
You might also want to consider storing the data in the http runtime cache which is free but not distributed.
You choice would largely depend on the size of the data, how often it changes and what effect is caused if someone receives slightly out of date data.