How to store temporary data in an Azure multi-instance (scale set) virtual machine? - azure

We developed a server service that (in a few words) supports the communications between two devices. We want to make advantage of the scalability given by an Azure Scale Set (multi instance VM) but we are not sure how to share memory between each instance.
Our service basically stores temporary data in the local virtual machine and these data are read, modified and sent to the devices connected to this server.
If these data are stored locally in one of the instances the other instances cannot access and do not have the same information. Is it correct?
If one of the devices start making some request to the server the instance that is going to process the request will not always be the same so the data at the end is spread between instances.
So the question might be, how to share memory between Azure instances?
Thanks

Depending on the type of data you want to share and how much latency matters, as well as ServiceFabric (low latency but you need to re-architect/re-build bits of your solution), you could look at a shared back end repository - Redis Cache is ideal as a distributed cache; SQL Azure if you want to use a relation db to store the data; storage queue/blob storage - or File storage in a storage account (this allows you just to write to a mounted network drive from both vm instances). DocumentDB is another option, which is suited to storing JSON data.

You could use Service Fabric and take advantage of Reliable Collections to have your state automagically replicated across all instances.
From https://azure.microsoft.com/en-us/documentation/articles/service-fabric-reliable-services-reliable-collections/:
The classes in the Microsoft.ServiceFabric.Data.Collections namespace provide a set of out-of-the-box collections that automatically make your state highly available. Developers need to program only to the Reliable Collection APIs and let Reliable Collections manage the replicated and local state.
The key difference between Reliable Collections and other high-availability technologies (such as Redis, Azure Table service, and Azure Queue service) is that the state is kept locally in the service instance while also being made highly available.
Reliable Collections can be thought of as the natural evolution of the System.Collections classes: a new set of collections that are designed for the cloud and multi-computer applications without increasing complexity for the developer. As such, Reliable Collections are:
Replicated: State changes are replicated for high availability.
Persisted: Data is persisted to disk for durability against large-scale outages (for example, a datacenter power outage).
Asynchronous: APIs are asynchronous to ensure that threads are not blocked when incurring IO.
Transactional: APIs utilize the abstraction of transactions so you can manage multiple Reliable Collections within a service easily.
Working with Reliable Collections -
https://azure.microsoft.com/en-us/documentation/articles/service-fabric-work-with-reliable-collections/

Related

Service fabric Stateful service - Scaling without partitioning?

I am planning to migrate my existing cloud monolithic Restful Web API service to Service fabric in three steps.
The Memory cache (in process) has been heavily used in my cloud service.
Step 1) Migrate cloud service to SF stateful service with 1 replica and single partition. The cache code is as it is. No use of Reliable collection.
Step 2) Horizontal scaling of SF Monolithic stateful service to 5 replica and single partition. Cache code is modified to use Reliable collection.
Step 3) Break down the SF monolithic service to micro services (stateless / stateful)
Is the above approach cleaner? Any recommendation.? Any drawback?
More on Step 2) Horizontal scaling of SF stateful service
I am not planning to use SF partitioning strategy as I could not think of uniform data distribuition in my applictaion.
By adding more replica and no partitioning with SF stateful service , I am just making my service more reliable (Availability) . Is my understanding correct?
I will modify the cache code to use Reliable collection - Dictionary. The same state data will be available in all replicas.
I understand that the GET can be executed on any replica , but update / write need to be executed on primary replica?
How can i scale my SF stateful service without partitioning ?
Can all of the replica including secondory listen to my client request and respond the same? GET shall be able to execute , How PUT & POST call works?
Should i prefer using external cache store (Redis) over Reliable collection at this step? Use Stateless service?
This document has a good overview of options for scaling a particular workload in Service Fabric and some examples of when you'd want to use each.
Option 2 (creating more service instances, dynamically or upfront) sounds like it would map to your workload pretty well. Whether you decide to use a custom stateful service as your cache or use an external store depends on a few things:
Whether you have the space in your main compute machines to store the cached data
Whether your service can get away with a simple cache or whether it needs more advanced features provided by other caching services
Whether your service needs the performance improvement of a cache in the same set of nodes as the web tier or whether it can afford to call out to a remote service in terms of latency
whether you can afford to pay for a caching service, or whether you want to make due with using the memory, compute, and local storage you're already paying for with the VMs.
whether you really want to take on building and running your own cache
To answer some of your other questions:
Yes, adding more replicas increases availability/reliability, not scale. In fact it can have a negative impact on performance (for writes) since changes have to be written to more replicas.
The state data isn't guaranteed to be the same in all replicas, just a majority of them. Some secondaries can even be ahead, which is why reading from secondaries is discouraged.
So to your next question, the recommendation is for all reads and writes to always be performed against the primary so that you're seeing consistent quorum committed data.

Azure Service Fabric Reliable Collection and other Persistent Store

I am very new to Service Fabric.
Is Service Fabric recommends to use only Reliable Collections to store ALL the data for an application?
What if I use SQL DB to persist all my business data and use Reliable Collection to lazily persist to SQL DB for integration purposes. Following DDD, if i persist my aggregate to SQL DB and leave a entry in reliable collection to communicate with other Bounded Context. Will this approach has any issues?
The Service Fabric does NOT recommend to store all the data in Reliable Collections. Its your choice. The Service Fabric provides you freedom on how to do things, on many levels.
You can use an external DB(like SQL DB or DocumentDB or anything) and use the stateful service as a cache. Or use the stateful service as a primary storage and don't use an external DB at all.
Even though the Reliable Collection is a bit limited in usage(its a key/value store with no effective query interface other than looping all the data) it has the advantage of being internally stored(performance) and it has good fail safe mechanisms(defining secondary instances, as many as you want). The partitioning capabilities should not be forgotten either.
Personally I tend to minimize the external dependencies. An external DB is a dependency. But if your requirements for your application specify extensive query capabilities, go for it.
According to Microsoft
Treat Reliable Actors as a transactional system. Service Fabric
Reliable Actors is not a two phase commit-based system offering ACID.
If we do not implement the optional persistence, and the machine the
actor is running on dies, its current state will go with it. The actor
will be coming up on another node very fast, but unless we have
implemented the backing persistence, the state will be gone. However,
between leveraging retries, duplicate filtering, and/or idempotent
design, you can achieve a high level of reliability and consistency.
https://acom-feature-videos-twitter-card.azurewebsites.net/en-us/documentation/articles/service-fabric-reliable-actors-anti-patterns/

How to store (and query) the MaxMind GeoIP2 database in Azure?

In an Azure Web App I need to efficiently query the MaxMind GeoIP2 City Database (due to the volume of queries and the latency requirements we cannot use the MaxMind's rest API).
I'm wondering what's the best approach for storing the db (binary MMDB format, accessed via the official .NET api) so that it's easy to update with minimal downtime (we are going to subscribe Monthly updates) and still cost effective as to what regards Azure storage and transactions.
Apparently block blobs are the way to go, but I'm not sure about the monthly updates and the fact that the GeoIP2 api load in memory the whole db (I do not know if this would be a problem for the Web App, if I need a web worker to keep it up or I need something else), but actually I do not know yet how large the file is.
What's the most cost effective solution that preserve low latency over a huge volume?
According to the API docs you must have the database available in a file system (the API doesn't know anything about Azure storage and related REST API). So, regardless where you permanently store it, you'll need to have it on a disk somewhere.
I have no idea how large the database footprint is, but Web Apps, Cloud Services (web/worker roles) and Virtual Machines (whether Linux or Windows) all have local disks. And you have read/write access to these disks. So, you'd need to copy the database binary file (or csv) to local disk from somewhere. At this point, when you initialize the SDK, you'd create a DatabaseReader and point it to your locally-downloaded copy of the database file.
You mentioned storing the database in blob storage. There's nothing stopping you from doing so and simply downloading a copy to local disk. And there's nothing stopping you from storing multiple versions in multiple blobs. Note: You may also take advantage of Azure File storage (an SMB share). Which you choose is up to you.
As far as most cost effective solution: You'll need to do the pricing workup yourself to see what's most effective. You'd also need to evaluate how much RAM is available for the given size VM/role instance/Web App you choose. You mentioned Web Apps in your question: Web App instances scale from 0.5GB to 14GB, depending on the tier you choose (again, you'll need to evaluate this).

Storing a large amount of state in a service fabric cluster

I have a scenario where we need to store x*100 GBs of data. The data is in-general a good candidate for persistent state for an actor (well-partitioned, used by the specific actors only) in the service fabric cluster itself.
Is the service fabric persistent state storage recommended for data of this scale? (Our compute load is going to be fairly low, so bumping up VMs just to store the state is not a desirable option.)
How does the amount of persistent state affect the latency of moving partitions between nodes in the cluster?
Well let's look at how state is stored in a service (this applies to actors too).
The component that stores your data in your service is called a State Provider. State providers can be in-memory only or in-memory + local disk. The default state provider you get with an actor service is in-memory + local disk but it only keeps hot data in memory so your storage requirements are not memory bound. Contrast with the Reliable Collections state provider which currently stores all data both in-memory and on local disk, although in a future release it will also have an option to only keep hot data in memory and offload the rest to local disk.
Given that you are using actors, you can use the default actor state provider which means your data capacity is limited by local disk storage on your machines or VMs, which should be reasonable for storing 100s of GB. We generally don't move entire partitions around, but occasionally Service Fabric does need to rebuild a replica of your service, and the more data you have the longer it will take to build a replica. However, this doesn't really affect the latency of your service, because you have multiple replicas in a stateful service and you usually have enough replicas up that you don't need to wait for another to be rebuilt. Rebuilding a replica is usually something that happens "off to the side."
It's true that it's not economical to add VMs just for storing state, but keep in mind that you can pack as many services onto your VMs as you want. So even though your actor service isn't using much compute, you can always pack other services on those VMs to use up that compute so that you're maximizing both compute and storage on your VMs, which can in fact be very economical.

Azure Traffic Manager for Cloud Services - What about storage access?

I have finally got the time to start looking at Azure. It's looks good and easy scaling.
Azure SQL, Table Storage and Blog Storage should cover most of my things. Fast access to data, auto replication and failover to an other datacenter.
Should the idea come for an app that needs fast global access the Traffic manager is there and one can route users for "Fail Over" or "Performance".
The "performance" is very nice for Cloud Services and "Web Roles / Worker Roles" ... BUT ... What about access to data from SQL Azure/Table Storage/Blog Storage.
I have tried searching the web(for what to do about this need), but haven't found anything about the traffic manager that mentions anything about how to access data in such a scenario.
Have I missed anything?
Do people access the storage in the original data center (and if that fails use the Geo Replication feature)? Is that fast enough? Is internal traffic on the MS network free across datacenters?
This seems like such a simple ...
Take a look at the guidance by Microsoft: Replicating, Distributing, and Synchronizing Data. You could use the Service Bus to keep data centers in Sync. This can cover SQL Databases, Storage, search indexes like SolR, ElasticSearch, ... The advantage over solutions like SQL Data Sync is that it's technology independent and it can keep virtually all your data in sync:
In this episode of Channel 9 they state that Traffic Manager is only for Cloud Services as of now (Jan 2014) but support is coming for Azure Web Sites and other services. I agree that you should be able to ask for a Blob using a single global URL and expect that the content will be served from the closest datacenter.
There isn't a one-click easy to implement solution for this issue. The way you solve it will depend on where the data lives (ie. SQL Azure, Blob storage, etc) and your access patterns.
Do you have a small number of data requests that are not on a performance critical path in your code? Consider just using the main datacenter.
Do you have a large number of read-only type of requests? Consider doing a replication of the data to another datacenter.
Do you do a large number of read and only a few write operations? Consider duplicating the data among all datacenters and each write will write to all datacenters at the same time (incurring a perf penalty) and do all reads to the local datacenter (fast reads).
Is your data in SQL Azure? Consider using SQL Data Sync to keep multiple datacenters in sync.

Resources