I have two Azure Websites set up - one that serves the client application with no database, another with a database and WebApi solution that the client gets data from.
I'm about to add a new table to the database and populate it with data using a temporary Seed method that I only plan on running once. I'm not sure what the best way to go about it is though.
Right now I have the database initializer set to MigrateDatabaseToLatestVersion and I've tested this update locally several times. Everything seems good to go but the update / seed method takes about 6 minutes to run. I have some questions about concurrency while migrating:
What happens when someone performs CRUD operations against the database while business logic and tables are being updated in this 6-minute window? I mean - the time between when I hit "publish" from VS, and when the new bits are actually deployed. What if the seed method modifies every entry in another table, and a user adds some data mid-seed that doesn't get hit by this critical update? Should I lock the site while doing it just in case (far from ideal...)?
Any general guidance on this process would be fantastic.
Operations like creating a new table or adding new columns should have only minimal impact on the performance and be transparent, especially if the application applies the recommended pattern of dealing with transient faults (for instance by leveraging the Enterprise Library).
Mass updates or reindexing could cause contention and affect the application's performance or even cause errors. Depending on the case, transient fault handling could work around that as well.
Concurrent modifications to data that is being upgraded could cause problems that would be more difficult to deal with. These are some possible approaches:
Maintenance window
The most simple and safe approach is to take the application offline, backup the database, upgrade the database, update the application, test and bring the application back online.
Read-only mode
This approach avoids making the application completely unavailable, by keeping it online but disabling any feature that changes the database. The users can still query and view data while the application is updated.
Staged upgrade
This approach is based on carefully planned sequences of changes to the database structure and data and to the application code so that at any given stage the application version that is online is compatible with the current database structure.
For example, let's suppose we need to introduce a "date of last purchase" field to a customer record. This sequence could be used:
Add the new field to the customer record in the database (without updating the application). Set the new field default value as NULL.
Update the application so that for each new sale, the date of last purchase field is updated. For old sales the field is left unchanged, and the application at this point does not query or show the new field.
Execute a batch job on the database to update this field for all customers where it is still NULL. A delay could be introduced between updates so that the system is not overloaded.
Update the application to start querying and showing the new information.
There are several variations of this approach, such as the concept of "expansion scripts" and "contraction scripts" described in Zero-Downtime Database Deployment. This could be used along with feature toggles to change the application's behavior dinamically as the upgrade stages are executed.
New columns could be added to records to indicate that they have been converted. The application logic could be adapted to deal with records in the old version and in the new version concurrently.
The Entity Framework may impose some additional limitations in the options, because it generates the SQL statements on behalf of the application, so you would have to take that into consideration when planning the stages.
Staging environment
Changing the production database structure and executing mass data changes is risky business, especially when it must be done in a specific sequence while data is being entered and changed by users. Your options to revert mistakes can be severely limited.
It would be necessary to do extensive testing and simulation in a separate staging environment before executing the upgrade procedures on the production environment.
I agree with the maintenance window idea from Fernando. But here is the approach I would take given your question.
Make sure your database is backed up before doing anything (I am assuming its SQL Azure)
Put up a maintenance page on the Client Application
Run the migration via Visual Studio to your database(I am assuming you are doing this through the console) or a unit test
Publish the website/web api websites
Verify your changes.
The main thing is working with the seed method via Entity Framework is that its easy to get it wrong and without a proper backup while running against Prod you could get yourself in trouble real fast. I would probably run it through your test database/environment first (if you have one) to verify what you want is happening.
Related
I am trying to understand the use of deployment slots for hosting my web app using the Azure app service.
I am particularly confused with the ideal ways to deal with the database while the swap is performed.
While maintaining two database versions seems like a solution, it adds the complexity of maintaining data across multiple databases to make them consistent.
What are the recommended ways for dealing with database schema and migrations while using blue/green deployments and in particular deployment slots?
Ideally you'll stage / production would share the same database, so it would not be an issue.
But if you have more slots, then you'd better also work with different databases and handle migrations during the release phase
We've worked through various solutions to this problem for a few years. There's not a toolset that provides a magic bullet for all cases. There are a few solutions:
Smaller databases/trivial changes
If it is possible to execute a migration script on a database that will complete in a second or two, and you can have an easy fallback script, you can execute the script concurrently with the swap. This can also be automated. But it's a higher stress situation and not one I'd recommend. This can even be done with EF Migrations.
Carefully ensure database compatibility between versions
Since we're dealing with a few hundred GB of data that cannot go down, we've just made it a rule that the database has to work with both versions of our application. It's not as awful or impossible as it sounds. For example, net new tables and fields can oftentimes be added before you even perform the swap. We test rollback between versions as part of our QA. If some fields need to be dropped, we wait until after the new version has been deployed and burned in, then run another script to perform the drops after we're sure we won't need rollback. We'll create new stored procedures when one needs to be upgraded so that the new version has its own. Example: sp_foo and sp_foo2.
We've had a lot of success with this strategy.
Slots are a feature specifically for App Services and not for DBs, if you want to use a specific DB with a specific slot then you setup the slot like this:
https://learn.microsoft.com/en-us/azure/app-service/deploy-staging-slots
Now when using Slots and swapping it also swaps App Configurations\Settings, and in App Settings you can have two DB connections strings but each with its own slot name and setting enabled. You can see it has been shown in this example here as well: https://learn.microsoft.com/en-us/azure/app-service/deploy-staging-slots#swap-two-slots
I'm working on a new project, and I am still learning about how to use Microservice/Domain Driven Design.
If the recommended architecture is to have a Database-Per-Service, and use Events to achieve eventual consistency, how does the service's database get initialized with all the data that it needs?
If the events indicating an update to the database occurred before the new service/db was ever designed, do I need to start with a copy of the previous database?
Or should I publish a 'New Service On The Block' event, and allow all the other services to vomit back everything back to me again? Which could be a LOT of chatty-ness, and cause performance issues.
how does the service's database get initialized with all the data that it needs?
It asks for it; which is to say that you design a protocol so that the service that is spinning up can get copies of all of the information that it needs. That often includes tracking checkpoints, and queries that allow you to ask what has happened since some checkpoint.
Think "pull", rather than "push".
Part of the point of "services": designing the right data boundaries. The need to copy a lot of data between services often indicates that the service boundaries need to be reconsidered.
There is a special streaming platform named Apache Kafka, that solves something similar.
With Kafka you would publish events for other services to consume. What makes Kafka special is the fact, that events never (depends on configuration) get deleted and can be consumed again by new services spinning up. This feature can be used for initially populating the database (by setting the offset for a Topic to 0 and therefore re-read the history of events).
There also is another feature, called GlobalKTable what is a TableView of all events for a particular Topic. The GlobalKTable holds the latest value for each key (like primary key) and can be turned into an state-store (RocksDB under the hood), what makes it queryable. This state-store initializes itself whenever the application starts up. So the application does not need to have a database itself, because the state-store would be kept up-to-date automatically (consistency still is a thing to keep in mind). Only for more complex queries that state-store would need to be accompanied with a database (with kafka you would try to pre-compute the results of those queries and make them accessible to a distinct state-store itself).
This would be a complex endeavor, but if it suits your needs it is a fun thing to do!
Here is my situation. I have an extensive REST based API that connects to a MongoDB database using Mongoose. The API is written as a standard "MEAN" stack application.
Currently, when a developer queries the API they're always connecting to the live production database. What I want to do is have an exact duplicate database as a "staging" database, where new data will be added first, vetted over a period of time, and then move to the live database. Then I want developers to be able to query either one simply by modifying their query.
I started looking into this with the Mongoose documentation, and it appears as though the models are tied to the DB connection, and if I want to have multiple connections I also have to have multiple models, one for each connection. This would be a nightmare of WET code and not the path I want to take.
What I want to do is not touch any of my code at all and simply have a switch that changes to the proper database for a given query. So my question is, how can I achieve this? Is it possible? The documentation seems to imply it is not.
Rather than trying to maintain connections two environments in the same code base have you considered setting up stage version of your application? Which database it connects to could be set through an environment variable or some other configuration option.
The developers would still then only have to make a change to query one or the other and you could migrate data from the stage database to production/live database once you have finished your vetting process.
I am creating a DB schema per customer. So whenever a new customer registers I need to quickly create their schema in runtime.
Option 1
In runtime, use Liquibase (or equivalent) to run all the changesets to generate the latest schema.
Cons:
This is slow, there can be multiple historical change setsa which are not relevant now any more (create table and year later drop it).
Liquibase is used here in runtime and not just "migration time". Not sure if this is a good idea.
Standartizing on Liquibase as a mean to create schema will force all developers to use it during development. We try to avoid loading more tools on the developers.
Option 2
After each build we generate a temporary DB using Liquibase changesets. Then from the DB we create a clean schema creation script based on the current snapshot. Then when a new customer comes we just run the clean script, not the full change set history.
Cons:
Next time I run liquibase it will try to run from changeset 1. A workaround might be to include in the generation script the creation of the changeset table and inserting to it the latest changeset.
New schemas are created using one script, while old schemas go through the changeset process. In theory this might cause a different schema. However, the single script went through the changeset process as well so I can't think of exact case that will cause an error, this is a theoretical problem for now.
What do you think?
I would suggest option #1 for the consistency.
Database updates can be complex and the less chance for variation the better. That means you should have your developers create the liquibase changeSets initially to update their databases as they are implementing new features to know they are running as they expect and then know that those same steps will be ran in QA all the way through production. It is an extra tool they need to deal with, but it should be easy to integrate into their standard workflow in a way that is easy for them to use.
Similarly, I usually recommend leaving non-relevant historical changeSets in your changeLog because if you remove them you are deviating from your known-good update path. Databases are fast with most operations, especially on a system with little to no data. If you have specific changeSets that are no longer needed and are excessively expensive for one reason or another you can remove them on a case by case basis, but I would suggest doing that very rarely.
You are right that creating a database snapshot from a liquibase script should be identical to running the changeLog--as long as you include the databasechangelog table in the snapshot. However, sticking with an actual Liquibase update all the way through to production will allow you to use features such as contexts, preconditions and changelog parameters that may be helpful in your case as well.
There are two approaches for database deployment:
Build once deploy many – this approach uses the same principle as the native code, compile once and copy the binaries across the environments. From database point of view this approach means that the deploy script is generated once and then executed across environments.
Build & Deploy on demand – this approach generates the delta script when needed in order to handle any out of process changes.
If you use the Build & Deploy on demand approach you can generate the delta script for the entire schema or work-item / changeset.
I recently submitted an upgrade of my app which included a lightweight coredata migration (including new fields in existing tables and a couple of new tables). I followed every tip regarding this migration, including some I found on this site.
I thoroughly tested the update on three different devices and it all went ok!!!
However, this update is crashing an all my devices and probably on all my customers. I can't explain why this is happening.
Could you please help me understand this debacle?
To truly test your app and migration, you need to run your original app to create data store according to the original data model. Then you need to run your new app, opening data store that was generated with original app. This can be a real pain and is easier (at least initially) to do in Simulator because you have more control over the file system and can swap in a saved original data store. On iDevice you need to regenerate original data store for each test.
If you are testing on your own development devices then you have already migrated your data store. Is it possible that your test devices created their data stores with new data model - and never actually performed a migration?
I only generally use automatic migration during beta testing, for quick revisions, other than that I always use a mapping model, so that you have control.
the other issue is that if your model shifts far enough between releases, auto migration from v1-v2 could be fine, and v2-v3 could be ok, but v1-v3 could be too drastic to be inferred. by making maps for them, you retain control of the migration.