Implementing database failover in Azure Service Fabric - azure

My company's application experienced database connection issues this morning resulting in me having to failover to our secondary database. Within our Azure App Services, this was an easy step of changing the connection string in the configuration, however I could not find an easy way of changing these settings on our Service Fabric services without redeploying.
I'm considering options to allow failover at runtime for these services to a secondary database but don't know what the 'best practices' would be. A couple options I have:
I could create a dns entry for our database server that i manage and then just switch that to the new server name when I need to fail over.
I could have some sort of rest api to call on my app services that would return whether or not to go to the secondary database.
Any other ideas? I'd like to make failover to the secondary as seamless as possible so it can be done quickly.

Have you considered putting both your primary and secondary database connection strings into your application's config and writing some code that automatically switches between them if it detects a problem? Both of the options you presented puts a human in the path, which means your users are going to experience downtime until the human fixes the problem (maybe the human is asleep, or on vacation, or on vacation and asleep).
In Service Fabric, Application (and system) upgrades are always rolling upgrades. Rolling upgrades have the advantage of preventing global outages. For example, suppose at some point you updated your config with the wrong connection string. A global config change might be quick and easy, but now you have a global outage and some upset customers. A rolling upgrade would have caught the error in the first upgrade domain and then rolled back, so only a fraction of your application would have been affected.
You can do a config-only rolling upgrade. This is where you make a change to your config package and then create a differential upgrade package so that only the config changes go out and your service process doesn't have to restart.

Just to post an update to my issue here. SQL Azure now has automatic failover groups. This is described here

Related

When should an Azure website be restarted, and what are the consequences?

In the Azure Management Portal, you can configure your website. As an example, you can change the PHP version your website is using. When you have edited a configuration option, you have to click “Save”.
So far, so good. But you also have the option to restart your site (by clicking “Restart“ next to “Save”).
My question is, when should you restart your website? Are there some configuration changes that require a restart, and others that don't? I haven't found any hints in the user interface.
Are there other situations that require a restart? Say, the website has been running for a given time without a restart?
Also, what are the consequences of restarting a website? Does it affect cookies/sessions in any way (i.e. delete a user's shopping cart or log them out)? Are there any other consequences I should be aware of?
Generally speaking, you may want to restart your website because of application performance issues. For example, you may have a memory leak in your application, connections not getting closed, or other things that would degrade the performance of the application over time. As you monitor your website and observe conditions like this you may make a decision to restart it. Even better, you may even automate the task of restarting when these conditions occurr. Anyway, these kinds of things are not unique to Azure Websites. You would take similar actions for a website running on-premises.
As for configuration changes, if you make a change to your web.config file, this change is detected and your website would be restarted automatically for you. Similarily, if you were to make configuration changes in the CONFIG page of your website in the Azure Management Portal such as application settings, connection strings, etc., then Azure Websites will detect this change to your environment and automatically restart it.
Indeed, restarting a website will result in any session data kept in memory being lost for that instance. Additionally, if you have startup/initialization code that takes time to complete then that will have to be rerun. Again, this is not anything unique to Azure Websites though.

FluentMigrator Migration From Application_Start

I am currently changing our database deployment strategy to use FluentMigration and have been reading up on how to run this. Some people have suggested that it can be run from Application_Start, I like this idea but other people are saying no but without specifying reasons so my questions are:
Is it a bad idea to run the database migration on application start and if so why?
We are planning on moving our sites to deploying to azure cloud services and if we don't run the migration from application_start how should/when should we run it considering we want to make the deployment as simple as possible.
Where ever it is run how do we ensure it is running only once as we will have a website and multiple worker roles as well (although we could just ensure the migration code is only called from the website but in the future we may increase to 2 or more instances, will that mean that it could run more than once?)
I would appreciate any insight on how others handle the migration of the database during deployment, particularly from the perspective of deployments to azure cloud services.
EDIT:
Looking at the comment below I can see the potential problems of running during application_start, perhaps the issue is I am trying to solve a problem with the wrong tool, if FluentMigrator isn't the way to go and it may not be in our case as we have a large number of stored procedures, views, etc. so as part of the migration I was going to have to use SQL scripts to keep them at the right version and migrating down I don't think would be possible.
What I liked about the idea of running during Application_Start was that I could build a single deployment package for Azure and just upload it to staging and the database migration would be run and that would be it, rather thank running manual scripts, and then just swap into production.
Running migrations during Application_Start can be a viable approach. Especially during development.
However there are some potential problems:
Application_Start will take longer and FluentMigrator will be run every time the App Pool is recycled. Depending on your IIS configuration this could be several times a day.
if you do this in production, users might be affected i.e. trying to access a table while it is being changed will result in an error.
DBA's don't usually approve.
What happens if the migrations fail on startup? Is your site down then?
My opinion ->
For a site with a decent amount of traffic I would prefer to have a build script and more control over when I change the database schema. For a hobby (or small non-critical project) this approach would be fine.
An alternative approach that I've used in the past is to make your migrations non-breaking - that is you write your migrations in such a way they can be deployed before any code changes and work with the existing code. This way both code and migrations both can be deployed independently 95% of the time. For example instead of changing an existing stored procedure you create a new one or if you want to rename a table column you add a new one.
The benefits of this are:
Your database changes can be applied before any code changes. You're then free to roll back any breaking code changes or breaking migrations.
Breaking migrations won't take the existing site down.
DBAs can run the migrations independently.

Cloud service restored to the last production deployment

I've update production deployment yesterday morning then I've made changes to service files using remote connection
add and update files and everything was OK.
today morning all the changes I've done after deployment was undone and customers use the old version and this cost us hundreds of thousand of pounds
i need to know what's happen nothing appeared in operations log
Probably what has happened is that Microsoft has updated your servers at the Cloud Centre and re-deployed your application from the original deployment. This is in their terms and conditions, you should not make any important manual changes to the deployment after it is deployed unless they are stored in the portal (environment settings etc.), otherwise they might be lost during updates or reboots.
I learned this the hard way too. I had a cache role with only one instance (I thought it only made sense with one instance) and while updates happened, my whole site went down several times over several days!
PaaS services are stateless, which means the VMs running your service can be destroyed and recreated at any time, at which point the VM will be recreated with the content from your original .cspkg.
For more information see http://blogs.msdn.com/b/kwill/archive/2012/09/19/role-instance-restarts-due-to-os-upgrades.aspx and http://blogs.msdn.com/b/kwill/archive/2012/10/05/windows-azure-disk-partition-preservation.aspx.
As others have said, PaaS Web Roles are stateless. If you're making manual configuration changes to your deployed solution package after it has been auto-deployed then any re-deployment by the Azure fabric will simply deploy the package minus your manual changes. To solve this issue you could use startup tasks to apply your manual changes using a PowerShell script or similar (depending on what you're changing). See http://msdn.microsoft.com/en-us/library/jj129544.aspx.
Note that startup tasks don't just run when a machine gets re-imaged or rebooted.

How to roll back a Windows Azure Cloud Service deployment

As title really, I can't see how you can Roll Back an update to a Windows Azure service/site?
Each "update" has a title, so I'd thought you could see a list of these updates, but I just can't see it listed anywhere?
I use https://manage.windowsazure.com, and these are cloud services.
Details on Rollbacks can be found here: http://msdn.microsoft.com/en-gb/library/windowsazure/hh472157.aspx#RollbackofanUpdate
To quote:
Windows Azure provides flexibility in managing services during an update by letting you initiate additional operations on a service, after the initial update request is accepted by the Windows Azure Fabric Controller. A rollback can only be performed when an update (configuration change) or upgrade is in the in progress state on the deployment. An update or upgrade is considered to be in-progress as long as there is at least one instance of the service which has not yet been updated to the new version. To test whether a rollback is allowed, check the value of the RollbackAllowed flag, returned by Get Deployment and Get Hosted Service Properties operations, is set to true.
Note
It only makes sense to call Rollback on an in-place update or upgrade because VIP swap upgrades involve replacing one entire running instance of your service with another. For more information about swapping VIPs, see How to Deploy a Service Upgrade to Production by Swapping VIPs in Windows Azure.
There is no stack maintained of your previous deployments, if however your present deployment somehow fails than the fabric may use your previous deployment(It tries to initialize previous states). Your roles are just VHD(Virtual hard disk) with a copy of OS and your installation files running virtually, whenever you update, a new VHD is allotted and the previous is destroyed(if update was successful). Thus there is no way to get the previous deployments(its destroyed). You should store your previous builds locally or on a foundation server just in case you need it like you now.

Sharepoint disaster recovery

What are your disaster recovery plans for Windows Sharepoint Services 3.0 ?
Currently we are backuping all databases (1 content, admin, search and config) using sql backup tools, and backuping the front end server via dataprotector.
To test our backups, we use another server farm, restore the content database (following the procedure on technet) and create a new application that uses this database. We just have to redeploy solutions on the newly created sharepoint application.
However, we have to change database access credentials (on sql server) : the user accounts used on production aren't the same as those used on our "test" farm.
At the end, we can restore our content database and access all our sites. Searching doesn't work, but we're investigating.
Is this restore scenario reliable (as in supported by microsoft) ?
You can't really backup / restore both config database and search database:
restoring config database only work if your new farm have exactly the same server names
when you restore the search database, the fulltext index is not synchronize. however, this is not a problem as you can just reindex.
As a result, I would say that yes, this a reliable for content. But take care of:
You may have to redo some configuration (AAM, managed path...).
This does not include customization, you want to keep a backup of your solution
Reliability is in the eye of the beholder. In this case, if your tests of the restore process is successful, then yes, it is reliable.
A number of my clients run SharePoint (both MOSS and WSS) in virtual environments, SQL Server is also virtualised and backed up both with SQL tools and with Volume Shadow copy.
The advantage of a Virtual Environment is downtime is only as long as it takes your Virtual Server host to boot the images.
If you are not using Virtualisation, then remember to backup transaction logs regularly as this will make it easier to restore to a given point in the day - it also means that your transaction logs dont grow too big!
I prefer to use the stsadm -o backup command 'for catastrophic backup' as it says in the help. This can be scheduled but requires some maintenance of the backup metadata XML file when you start running out of disk space and need to archive older backups. It has the advantage of transferring over timer jobs (usually) and other configuration because as Nico says, restoring the config database won't work for most situations.
To restore, you can use the user interface which is nice and not have to mess around with much else. I think it restores your solutions as well but haven't tested that extensively.

Resources