Virtual machine with SQL Server recovery using Premium disk - azure

I have a VM with SQL Server and an application that uses no more than 50 users. I don't require to have a zero downtime application in case my VM or datacenter had an issue, but what I need at least to assure is that I can make the app available again in less than 30 minutes.
First approach: using an Availability Set with 2 VM's won't work actually because my SQL Server lives in the same VM and I don't think Availability Set will take care of the real time replication of my SQL Server data, it will care only about the web application itself and not the persistent data (if I'm wrong please let me know), so having the above statement AV Set is not for me. Also It will be twice expensive because of the 2 VMs.
Second approach: using Recovery Site with disaster recovery I was reading that wont warranty to have a zero data loss, because there is a minimum frequency of replication and I think is 1 hour, so you have to be prepared to deal with 1 hour of data loss and I don't like this.
Third option: Azure Backup for SQL Server VM, this option could work the only downside is that has a RPO of 15 minutes that is not that much, but the problem is that if by some reason the user generates in the app some critical records we wont be able to get them again into the app because the user always destroy everything right away when they register into the app.
Fourth approach: Because I don't really require a zero downtime app, I was thinking on just having the actual VM using 2 premium disks one for SQL Server data files and other for SQL Server logs. In case of a VM failure I will get notified by users inmediately and what I can do is to create a snapshot of OS disk, and SQL premium disks (total of 3) and then create a new VM using these snapshots, so I will get a new working VM maybe in a different region having the exact very last data inserted into SQL before the failure happened.
Of course I guess I will need on top the VM a load balancer so I can just reroute traffic to the new VM. The failed VM i will just kill it and use the new VM as my new system. If fail happens again I just follow same process so this way I just only pay for one VM and not two.
Is this someone has already tried, does this sound reasonable and doable or Im missing a big thing or maybe I wont get what I expect to get?

You better use Azure SQL (PaaS) instead of VM, there are many different options that you can do for your needs. Running SO + SQL in the same VM is not recommended, changing to a Azure SQL (PaaS) you can decrease your hardware for SO VM and configure your SQL for supporting 50 users. Also you can use Load Balancer as you said, either Traffic Manager (https://learn.microsoft.com/pt-br/azure/traffic-manager/traffic-manager-overview) or Application Gateway (https://learn.microsoft.com/pt-br/azure/application-gateway/overview) to route traffic to your SO VM's where the application is running. Depends on your application you can migrate to Azure Web App (https://learn.microsoft.com/en-us/azure/app-service/).
Azure SQL (Paas) you can have less than 30 minutes for sure, I would say almost zero down time although you don't required it.
Automatic backups and Point-in-time restores
https://learn.microsoft.com/pt-br/azure/sql-database/sql-database-automated-backups
Active geo-replication
https://learn.microsoft.com/pt-br/azure/sql-database/sql-database-active-geo-replication
Zone-redundant databases
https://learn.microsoft.com/pt-br/azure/sql-database/sql-database-high-availability
Finally I don't think having Always-on (https://learn.microsoft.com/en-us/sql/database-engine/availability-groups/windows/overview-of-always-on-availability-groups-sql-server?view=sql-server-ver15) solution is good, once it is expensive and there are only 50 users. That's why I believe you better thinking of a Saas + PaaS solution for your application and database. Your 4th option sounds fine, but you need to create a new VM, configure IP, install SQL, configure SQL and so on to bring up your SQL.
What users is going to do if it happens when you are not available to fix it immediately? Your 30 minutes won't be accomplished :)

Related

Migration websites to Azure platform [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
We have several website hosted on a dedicated server with following configuration
Windows OS based dedicated server
MS SQL SERVER Windows 2012 for database
IIS 7.5+
and other software to managing website such as plesk
We have developed websites in asp.net web-form with framework 4.0 or 4.5 (about 10 website)
We also have few asp.net MVC based website with framework 4.5 (about 5 websites)
We use InProc default session state for sessions
beside this we have other software installed for security etc.
we also have mobile application api running same server and same application is use to send push notification to OS & Android devices
Now i have few question regarding migration to Azure enviroment.
First, my big concerns are session state, i want to migrate to Azure without making any changes to code except changes to web.config is this possible?
Second we use MS SQL SERVER 2012 as database and on azure we have to use Azure SQL Database which i believe is not same as MS SQL SERVER, Can i use Azure SQL Database or i should still to MS SQL SERVER as i have application running on this and migrating to Azure database may create problems?
Third, let us say i choose web+Mobile--> App Service Standard Package (which comes with upto 10 instance) what are these instance? and will individual session always connect to same instance?
Forth: I have about 20 database one of them is about 6GB & other database are about 200MB-700MB which service should i use for database in case i use Azure SQL Database
Single Database or Elastic?
Can i create multiple database under elastic mode?
Let us say if i choose "100 eDTUs: 10 GB included storage per pool, 200 DBs per pool, $0.20/hour" will i have total of 10GB space for all
database in elastic pool and what is per pool and how many pools will
i get in this option.
or MS SQL SERVER on Virtual Machine is better option as i can run SQL Server based session
Fifth: Disk Space, let us say i choose App Service "S2: 2 Cores(s), 3.5 GB RAM, 50 GB Storage, $0.200", is 50GB disk space include OS or space allocated to file which we upload?
Sixth: Some of our application are used to send push notification to iOS & Android device i am not sure if they will work in Azure environment as they need certain ports to be open and also some sort of certificate to be installed on the server.
I have asked too many question as i didn't had clarity from MS chat as they just passed links which can be confusing at time, i hope i get clarity here
Q: First, my big concerns are session state, i want to migrate to Azure without making any changes to code except changes to web.config is this possible?
If one of your concerns is code refactoring, then the model you should chose is Infrastructure-As-A-Service. In this model, there is no need to change in code because the infrastructure on Azure can be similar to the on-premises in which you provision virtual machines to run Windows Server, SQL Server and IIS. Software versions are all of your choice with no limitation. As long as the software version is still supported in Microsoft product lifecycle when procuring new software license.
If you'd love to modernize your web application, Azure App Service can be a good chosen destination. Azure App Service can run code compiled against .NET 4.0 framewor. InProc session state is not guaranteed in Azure App Service so you need to look into an alternative if using Azure App Service, for example Azure Redis Cache.
Q: Second we use MS SQL SERVER 2012 as database and on azure we have to use Azure SQL Database which i believe is not same as MS SQL SERVER, Can i use Azure SQL Database or i should still to MS SQL SERVER as i have application running on this and migrating to Azure database may create problems?
Without impact analysis and how complex your data model is, it's hard to say whether Azure SQL Database is compatible with your database. Fortunately, Microsoft provides a tool named Data Migration Assistant (DAM) which assists you to perform database compatibility analysis for Azure SQL Database. This link gives you more details on DAM (https://learn.microsoft.com/en-us/azure/sql-database/sql-database-cloud-migrate). Moving from SQL Server to Azure SQL Database would gain more benefits in high availability, disaster recovery and scalability. Administration effort with server management, OS patching is significantly reduced. With SQL Server in Azure VM, the migration cost is much better as you only need to shift and lift (provision VM, perform database detach/attach or other backup/restore methods).
Q: Third, let us say i choose web+Mobile--> App Service Standard Package (which comes with upto 10 instance) what are these instance? and will individual session always connect to same instance?
No, session would not be maintained in guaranteed mode. When you chose Azure App Service, your web application will be run on virtualized servers running Windows Server and IIS. The term "Instance" is server instance. Azure App Service helps you handle scaling by allocating compute resource across multiple instance to make sure your application does not get crashed with inadequate memory and resource. The default at the first time you provision your web app is 1, but the number of instance is configurable.
Q: Forth: I have about 20 database one of them is about 6GB & other database are about 200MB-700MB which service should i use for database in case i use Azure SQL Database
Single Database or Elastic?
Can i create multiple database under elastic mode?
Let us say if i choose "100 eDTUs: 10 GB included storage per pool, 200 DBs per pool, $0.20/hour" will i have total of 10GB space for all database in elastic pool and what is per pool and how many pools will i get in this option.
or MS SQL SERVER on Virtual Machine is better option as i can run SQL Server based session
Choosing Single Database or Elastic depends on performance and peak load of your database. Single database is used for independently database, when you can specify the DTU (Data Transaction Unit) for predictable performance. While Elastic Pool is best for managing set of databases in a pool. Elastic Pool is a choice for unpredictable performance and usage.
In your case, I'd recommend to use Elastic Pool to rescue performance. Elastic Pool allows you to set eDTU for your pool no matter how much of DTU a specific database in a pool needs. Elastic Pool monitors and perform performance analysis in depth to give you an insight and overall picture of each database performance.
When it comes to pool, you should not worry about how much storage you are given to each database. You don't also have to worry about the number of databases you can store in a pool. Saying you have total 20 databases, you need only one pool.
The eDTU you need can be calculated via this website http://dtucalculator.azurewebsites.net/. Just run one of the given scripts in the website on your SQL Server (where your on-premises databases are running) to capture performance metrics, then upload Excel file to the website. It will gives you a number. For example, the result says that total 20 databases need totally 100 eDTU. Then you just create an Elastic pool and adjust 100 eDTU for the pool. However, if using Elastic Pool Basic, you are only given 10 GB per pool which is not enough for 120 GB (20 * 6 GB), then you need Elastic Pool Standard for 100 eDTU to achieve 750 GB maximum. Note that you can choose Basic plan of 1,200 eDTU to achieve 156 GB maximum. However, this way is never recommended because storage space is much cheaper than eDTU.
In a nutshell, with your draft info above, I'd recommend to chose Standard plan of Elastic Pool with 100 eDTU. You can increase number of eDTU if it does not satisfy the performance of totally 20 databases. No database downtime is needed when adjusting eDTU number.
Creating only 1 pool is not really my recommendation. It depends on your database workload. For example, in 20 databases, there are 5 databases that are heavy workload for an ERP or business-critical systems while the rest are just normal databases. In this case, you'd need two Elastic pools. One pool with high number of eDTU is set, and another pool has low number of eDTU.
Q: Fifth: Disk Space, let us say i choose App Service "S2: 2 Cores(s), 3.5 GB RAM, 50 GB Storage, $0.200", is 50GB disk space include OS or space allocated to file which we upload?
When it comes to Azure App Service, OS is not counted in. 50 GB storage space is given directly to your application's space (to store image, compiled DLL, video, library..)
Q: Sixth: Some of our application are used to send push notification to iOS & Android device i am not sure if they will work in Azure environment as they need certain ports to be open and also some sort of certificate to be installed on the server.
Azure Notification Hubs can help you achieve push notification. Azure Notification Hub allows you to use certificate of each kind of platform (e.g iOS to manage devices. This is a sample reference if you are familiar with iOS https://learn.microsoft.com/en-us/azure/notification-hubs/notification-hubs-ios-apple-push-notification-apns-get-started. Azure Notification Hub also supports token-based for APNS if you need.
For each case, please give more details (e.g. your mobile scenario) , and specific questions if possible so I and people here can elaborate more.

How to update website hosted in Azure Scaleset

Let's say that I have an azure sql (paas) that has in front of it a scale set of VM's, each VM containing a website hosted in IIS. In front of the scaleset i have a traffic manager for website updates purpose only ( whenever i need an update, i create a second scaleset with VM's having the latest version and after the second scaleset deploy, i change the traffic manager to route traffic to the newly created scaleset). The website is 100% stateless. The problem raises for me just after i create the second scaleset: how would i run rollout scripts on sql azure db without disturbing clients that consume the old website version?
I am thinking of using mirroring or sort of for the db to replicate transactions on a second azure db, while running on it in the same time rollout scripts. I just have then to cut traffic from the live scaleset and wait for sessions to drain and then switch to the new scaleset. is this a good aproach? I see that i lose the "always on" capability that i really need. I really don' t know the best practice, a book or a link would be highly apreciated.
In short, i would like to remain as "highly available as possible" even for planned application updates. How can I reach this?
Try or consider having 1 more SQL Azure database. Your traffic manager could indicate also which of the 2 SQL Azure database is the active. Deploy changes to the passive database, then update the traffic manager database to make the passive database the new active database.

Turning off ServiceFabric clusters overnight

We are working on an application that processes excel files and spits off output. Availability is not a big requirement.
Can we turn the VM sets off during night and turn them on again in the morning? Will this kind of setup work with service fabric? If so, is there a way to schedule it?
Thank you all for replying. I've got a chance to talk to a Microsoft Azure rep and documented the conversation in here for community sake.
Response for initial question
A Service Fabric cluster must maintain a minimum number of Primary node types in order for the system services to maintain a quorum and ensure health of the cluster. You can see more about the reliability level and instance count at https://azure.microsoft.com/en-gb/documentation/articles/service-fabric-cluster-capacity/. As such, stopping all of the VMs will cause the Service Fabric cluster to go into quorum loss. Frequently it is possible to bring the nodes back up and Service Fabric will automatically recover from this quorum loss, however this is not guaranteed and the cluster may never be able to recover.
However, if you do not need to save state in your cluster then it may be easier to just delete and recreate the entire cluster (the entire Azure resource group) every day. Creating a new cluster from scratch by deploying a new resource group generally takes less than a half hour, and this can be automated by using Powershell to deploy an ARM template. https://azure.microsoft.com/en-us/documentation/articles/service-fabric-cluster-creation-via-arm/ shows how to setup the ARM template and deploy using Powershell. You can additionally use a fixed domain name or static IP address so that clients don’t have to be reconfigured to connect to the cluster. If you have need to maintain other resources such as the storage account then you could also configure the ARM template to only delete the VM Scale Set and the SF Cluster resource while keeping the network, load balancer, storage accounts, etc.
Q)Is there a better way to stop/start the VMs rather than directly from the scale set?
If you want to stop the VMs in order to save cost, then starting/stopping the VMs directly from the scale set is the only option.
Q) Can we do a primary set with cheapest VMs we can find and add a secondary set with powerful VMs that we can turn on and off?
Yes, it is definitely possible to create two node types – a Primary that is small/cheap, and a ‘Worker’ that is a larger size – and set placement constraints on your application to only deploy to those larger size VMs. However, if your Service Fabric service is storing state then you will still run into a similar problem that once you lose quorum (below 3 replicas/nodes) of your worker VM then there is no guarantee that your SF service itself will come back with all of the state maintained. In this case your cluster itself would still be fine since the Primary nodes are running, but your service’s state may be in an unknown replication state.
I think you have a few options:
Instead of storing state within Service Fabric’s reliable collections, instead store your state externally into something like Azure Storage or SQL Azure. You can optionally use something like Redis cache or Service Fabric’s reliable collections in order to maintain a faster read-cache, just make sure all writes are persisted to an external store. This way you can freely delete and recreate your cluster at any time you want.
Use the Service Fabric backup/restore in order to maintain your state, and delete the entire resource group or cluster overnight and then recreate it and restore state in the morning. The backup/restore duration will depend entirely on how much data you are storing and where you export the backup.
Utilize something such as Azure Batch. Service Fabric is not really designed to be a temporary high capacity compute platform that can be started and stopped regularly, so if this is your goal you may want to look at an HPC platform such as Azure Batch which offers native capabilities to quickly burst up compute capacity.
No. You would have to delete the cluster and recreate the cluster and deploy the application in the morning.
Turning off the cluster is, as Todd said, not an option. However you can scale down the number of VM's in the cluster.
During the day you would run the number of VM's required. At night you can scale down to the minimum of 5. Check this page on how to scale VM sets: https://azure.microsoft.com/en-us/documentation/articles/service-fabric-cluster-scale-up-down/
For development purposes, you can create a Dev/Test Lab Service Fabric cluster which you can start and stop at will.
I have also been able to start and stop SF clusters on Azure by starting and stopping the VM scale sets associated with these clusters. But upon restart all your applications (and with them their state) are gone and must be redeployed.

How do I make my Windows Azure application resistant to Azure datacenter catastrophic event?

AFAIK Amazon AWS offers so-called "regions" and "availability zones" to mitigate risks of partial or complete datacenter outage. Looks like if I have copies of my application in two "regions" and one "region" goes down my application still can continue working as if nothing happened.
Is there something like that with Windows Azure? How do I address risk of datacenter catastrophic outage with Windows Azure?
Within a single data center, your Windows Azure application has the following benefits:
Going beyond one compute instance, your VMs are divided into fault domains, across different physical areas. This way, even if an entire server rack went down, you'd still have compute running somewhere else.
With Windows Azure Storage and SQL Azure, storage is triple replicated. This is not eventual replication - when a write call returns, at least one replica has been written to.
Ok, that's the easy stuff. What if a data center disappears? Here are the features that will help you build DR into your application:
For SQL Azure, you can set up Data Sync. This facility synchronizes your SQL Azure database with either another SQL Azure database (presumably in another data center), or an on-premises SQL Server database. More info here. Since this feature is still considered a Preview feature, you have to go here to set it up.
For Azure storage (tables, blobs), you'll need to handle replication to a second data center, as there is no built-in facility today. This can be done with, say, a background task that pulls data every hour and copies it to a storage account somewhere else. EDIT: Per Ryan's answer, there's data geo-replication for blobs and tables. HOWEVER: Aside from a mention in this blog post in December, and possibly at PDC, this is not live.
For Compute availability, you can set up Traffic Manager to load-balance across data centers. This feature is currently in CTP - visit the Beta area of the Windows Azure portal to sign up.
Remember that, with DR, whether in the cloud or on-premises, there are additional costs (such as bandwidth between data centers, storage costs for duplicate data in a secondary data center, and Compute instances in additional data centers). .
Just like with on-premises environments, DR needs to be carefully thought out and implemented.
David's answer is pretty good, but one piece is incorrect. For Windows Azure blobs and tables, your data is actually geographically replicated today between sub-regions (e.g. North and South US). This is an async process that has a target of about a 10 min lag or so. This process is also out of your control and is purely for a data center loss. In total, your data is replicated 6 times in 2 different data centers when you use Windows Azure blobs and tables (impressive, no?).
If a data center was lost, they would flip over your DNS for blob and table storage to the other sub-region and your account would appear online again. This is true only for blobs and tables (not queues, not SQL Azure, etc).
So, for a true disaster recovery, you could use Data Sync for SQL Azure and Traffic Manager for compute (assuming you run a hot standby in another sub-region). If a datacenter was lost, Traffic Manager would route to the new sub-region and you would find your data there as well.
The one failure that you didn't account for is in the ability for an error to be replicated across data centers. In that scenario, you may want to consider running Azure PAAS as part of HP Cloud offering in either a load balanced or failover scenario.

Minimize downtime in Azure

We are experiencing a very serious unscheduled downtime of our Azure application today for what is now coming up to 9 hours. We reported to Azure support and the ops team is actively trying to fix the problem and I do not doubt that. We managed to get our application running on another "test" hosted service that we have and redirected our CNAME to point at the instance so our customers are happy, but the "main" hosted service is still unavailable.
My own "finger in the air" instinct is that the issue is network related within our data center (west europe), and indeed, later on in the day the service dash board has gone red for that region with a message to that effect. (Our application is showing as "Healthy" in the portal, but is unreachable via our cloudapp.net URL. Additionally threads within our application are logging sql connection exceptions into our storage account as it cannot contact the DB)
What is very strange, though, is that the "test" instance I referred to above is also in the same data centre and has no issues contacting the DB and it's external endpoint is fully available.
I would like to ask the community if there is anything that I could have done better to avoid this downtime? I obeyed the guidance with respect to having at least 2 roles instances per role, yet I still got burned. Should I move to a more reliable data centre? Should I deploy my application to multiple data centres? How would I manage the fact that my SQL-Azure DB is in the same datacentre?
Any constructive guidance would be appreciated - being a techie, I've never had a more frustrating day being able to do nothing to help fix the issue.
There was an outage in the European data center today with respect to SQL Azure. Some of our clients got hit and had to move to another data center.
If you are running mission critical applications that cannot be down, I would deploy the application into multiple regions. DNS resolution is obviously a weak link right now in Azure, but can be worked around (if you only run a website it can be done very simply using Response.Redirects or similar)
Now, there is a data synchronization service from Microsoft that will sync up multiple SQL Azure databases. Check here. This way, you can have mirror sites up in different regions and have them be in sync with SQL Azure perspective
Also, be a good idea to employ a 3rd party monitoring service that would detect problems with your deployed instances externally. AzureWatch can notify or even deploy new nodes if you choose to, when some of the instances turn "Unresponsive"
Hope this helps
I can offer some guidance based on our experience:
Host your application in multiple data centers, complete with Sql Azure databases. You can connect each application to its data center specific Sql Server. You can also cache any external assets (images/JS/CSS) on the data center specific Windows Azure machine or leverage Azure Blog Storage. Note: Extra costs will be incurred.
Setup one-way SQL replication between your primary Sql Azure DB and the instance in the other data center. If you want to do bi-rectional replication, take a look at the MSDN site for guidance.
Leverage Azure Traffic Manager to route traffic to the data center closest to the user. It has geo-detection capabilities which will also improve the latency of your application. So you can redirect map http://myapp.com to the internal url of your data center and a user in Europe should automatically get redirected to the European data center and vice versa for USA. Note: At the time of writing this post, there is not a way to automatically detect and failover to a data center. Manual steps will be involved, once a failover is detected and failover is a complete set (i.e. you will failover both the Windows Azure AND Sql Azure instances). If you want micro-level failover, then I suggest putting all your config the in the service config file and encrypt the values so you can edit the connection string to connect instance X to DB Y.
You are all set now. I would create or install a local application to detect the availability of the site. A better solution would be to create a page to check for the availability of application specific components by writing a diagnostic page or web service and then poll it from a local computer.
HTH
As you're deploying to Azure you don't have much control about how SQL server is setup. MS have already set it up so that it is highly available.
Having said that, it seems that MS has been having some issues with SQL Azure over the last few days. We've been told that it only affected "a small number of users". At one point the service dashboard had 5 data centres affected by a problem. I had 3 databases in one of those data centres down twice for about an hour each time, but one database in another affected data centre that had no interruption.
If having a database connection is critical to your app, then the only way in the Azure environment to ensure against problems that MS haven't prepared against (this latest technical problem, earthquakes, meteor strikes) would be to co-locate your sql data in another data centre. At the moment the most practical way to do this is to use the synch framework. There is an ability to copy SQL Azure databases, but this only works within a data centre. With your data located elsewhere you could then point your app at the new database if the main one becomes unavailable.
While this looks good on paper though, this may not have helped you with the latest problem as it did affect multiple data centres. If you'd just been making database copies on a regular basis, that might have been enough to get you through. Or not.
(I would have posted this answer on server fault, but I couldn't find the question)
This is just about a programming/architecture issue, but you amy also want to ask the question on webmasters.stackexchange.com
You need to find out the root cause before drawing any conclusions.
However. my guess one of two things was the problem
The ISP connectivity differs for the test system and your production system. Either they use different ISPs, or different lines from the same ISP. When I worked in a hosting company we made sure that ou IP connectivity went through at least two different ISPS who did not share fibre to our premises (and where we could, they had different physical routes to the building - the homing ability of backhoes when there's a critical piece of fibre to dig up is well proven
Your datacentre had an issue with some shared production infrastructure. These might be edge routers, firewalls, load balancers, intrusion detection systems, traffic shapers etc. These typically are also often only installed on production systems. Defences here involve understanding the architecture and making sure the provider has a (tested!) DR plan for restoring SOME service when things go pair shaped. Neatest hack I saw here was persuading an IPS (intrusion prevention system) that its own management servers were malicious. And so you couldn't reconfigure it at all.
Just a thought - your DC doesn't host any of the Wikileaks mirrors, or Paypal/Mastercard/Amazon (who are getting DDOS'd by wikileaks supporters at the moment)?

Resources