DB migration: Single creation script vs change sets - data-migration

I am creating a DB schema per customer. So whenever a new customer registers I need to quickly create their schema in runtime.
Option 1
In runtime, use Liquibase (or equivalent) to run all the changesets to generate the latest schema.
Cons:
This is slow, there can be multiple historical change setsa which are not relevant now any more (create table and year later drop it).
Liquibase is used here in runtime and not just "migration time". Not sure if this is a good idea.
Standartizing on Liquibase as a mean to create schema will force all developers to use it during development. We try to avoid loading more tools on the developers.
Option 2
After each build we generate a temporary DB using Liquibase changesets. Then from the DB we create a clean schema creation script based on the current snapshot. Then when a new customer comes we just run the clean script, not the full change set history.
Cons:
Next time I run liquibase it will try to run from changeset 1. A workaround might be to include in the generation script the creation of the changeset table and inserting to it the latest changeset.
New schemas are created using one script, while old schemas go through the changeset process. In theory this might cause a different schema. However, the single script went through the changeset process as well so I can't think of exact case that will cause an error, this is a theoretical problem for now.
What do you think?

I would suggest option #1 for the consistency.
Database updates can be complex and the less chance for variation the better. That means you should have your developers create the liquibase changeSets initially to update their databases as they are implementing new features to know they are running as they expect and then know that those same steps will be ran in QA all the way through production. It is an extra tool they need to deal with, but it should be easy to integrate into their standard workflow in a way that is easy for them to use.
Similarly, I usually recommend leaving non-relevant historical changeSets in your changeLog because if you remove them you are deviating from your known-good update path. Databases are fast with most operations, especially on a system with little to no data. If you have specific changeSets that are no longer needed and are excessively expensive for one reason or another you can remove them on a case by case basis, but I would suggest doing that very rarely.
You are right that creating a database snapshot from a liquibase script should be identical to running the changeLog--as long as you include the databasechangelog table in the snapshot. However, sticking with an actual Liquibase update all the way through to production will allow you to use features such as contexts, preconditions and changelog parameters that may be helpful in your case as well.

There are two approaches for database deployment:
Build once deploy many – this approach uses the same principle as the native code, compile once and copy the binaries across the environments. From database point of view this approach means that the deploy script is generated once and then executed across environments.
Build & Deploy on demand – this approach generates the delta script when needed in order to handle any out of process changes.
If you use the Build & Deploy on demand approach you can generate the delta script for the entire schema or work-item / changeset.

Related

Recommended ways to deal with database migrations while doing a swap using deployment slots

I am trying to understand the use of deployment slots for hosting my web app using the Azure app service.
I am particularly confused with the ideal ways to deal with the database while the swap is performed.
While maintaining two database versions seems like a solution, it adds the complexity of maintaining data across multiple databases to make them consistent.
What are the recommended ways for dealing with database schema and migrations while using blue/green deployments and in particular deployment slots?
Ideally you'll stage / production would share the same database, so it would not be an issue.
But if you have more slots, then you'd better also work with different databases and handle migrations during the release phase
We've worked through various solutions to this problem for a few years. There's not a toolset that provides a magic bullet for all cases. There are a few solutions:
Smaller databases/trivial changes
If it is possible to execute a migration script on a database that will complete in a second or two, and you can have an easy fallback script, you can execute the script concurrently with the swap. This can also be automated. But it's a higher stress situation and not one I'd recommend. This can even be done with EF Migrations.
Carefully ensure database compatibility between versions
Since we're dealing with a few hundred GB of data that cannot go down, we've just made it a rule that the database has to work with both versions of our application. It's not as awful or impossible as it sounds. For example, net new tables and fields can oftentimes be added before you even perform the swap. We test rollback between versions as part of our QA. If some fields need to be dropped, we wait until after the new version has been deployed and burned in, then run another script to perform the drops after we're sure we won't need rollback. We'll create new stored procedures when one needs to be upgraded so that the new version has its own. Example: sp_foo and sp_foo2.
We've had a lot of success with this strategy.
Slots are a feature specifically for App Services and not for DBs, if you want to use a specific DB with a specific slot then you setup the slot like this:
https://learn.microsoft.com/en-us/azure/app-service/deploy-staging-slots
Now when using Slots and swapping it also swaps App Configurations\Settings, and in App Settings you can have two DB connections strings but each with its own slot name and setting enabled. You can see it has been shown in this example here as well: https://learn.microsoft.com/en-us/azure/app-service/deploy-staging-slots#swap-two-slots

Cleaning up mongodb after integration tests in node

I have an api written in node with a mongodb back end.
I'm using supertest to automate testing of an api. Of course this results in a lot of changes to database and I like to get some input on options for managing this. The goal is for each test to have no permanent impact on the database. The database should look exactly the same after the test is finished as it did before the test ran.
In my case, I don't want the database to be dropped or fully emptied out between tests. I need some real data maintained in the database at all times. I just want the changes by the tests themselves to be reverted.
With a relational database, I would put a transaction around each unit tests and roll it back after the test was done (pass or fail). As far as I know, this is not an option with mongo.
Some options I have considered:
Fake databases
I've heard of in-memory databases like fongo (which is a Java thing) and tingodb. I haven't used these but the issue with this type of solution is always that it requires a good parity with the actual product to maintain itself as a viable option. As soon as I used a mongo feature that the fake doesn't support I'll have a problem unit testing.
Manual cleanup
There is always the option of just having a routine that finds all the data added by the test (marked in some way) and removes it. You'd have to be careful about updates and deletes here. Also there is likely a lot of upkeep making sure the cleanup routine accurately cleans things up.
Database copying
If it were fast enough, maybe having a baseline test database and making a copy of it before each test could work. It'd have to be pretty fast though.
So how do people generally handle this?
I think this is a brand new way in testing without transaction.
imho - using mongo >=3.2, we can setup inMemory storage engine, which is perfect for this kind of scenario.
Start mongo with inMemory
restore database
create a working copy for test
perform a test on working copy
drop working copy
if more tests GOTO 3

Azure Web Site Migrations & Concurrency

I have two Azure Websites set up - one that serves the client application with no database, another with a database and WebApi solution that the client gets data from.
I'm about to add a new table to the database and populate it with data using a temporary Seed method that I only plan on running once. I'm not sure what the best way to go about it is though.
Right now I have the database initializer set to MigrateDatabaseToLatestVersion and I've tested this update locally several times. Everything seems good to go but the update / seed method takes about 6 minutes to run. I have some questions about concurrency while migrating:
What happens when someone performs CRUD operations against the database while business logic and tables are being updated in this 6-minute window? I mean - the time between when I hit "publish" from VS, and when the new bits are actually deployed. What if the seed method modifies every entry in another table, and a user adds some data mid-seed that doesn't get hit by this critical update? Should I lock the site while doing it just in case (far from ideal...)?
Any general guidance on this process would be fantastic.
Operations like creating a new table or adding new columns should have only minimal impact on the performance and be transparent, especially if the application applies the recommended pattern of dealing with transient faults (for instance by leveraging the Enterprise Library).
Mass updates or reindexing could cause contention and affect the application's performance or even cause errors. Depending on the case, transient fault handling could work around that as well.
Concurrent modifications to data that is being upgraded could cause problems that would be more difficult to deal with. These are some possible approaches:
Maintenance window
The most simple and safe approach is to take the application offline, backup the database, upgrade the database, update the application, test and bring the application back online.
Read-only mode
This approach avoids making the application completely unavailable, by keeping it online but disabling any feature that changes the database. The users can still query and view data while the application is updated.
Staged upgrade
This approach is based on carefully planned sequences of changes to the database structure and data and to the application code so that at any given stage the application version that is online is compatible with the current database structure.
For example, let's suppose we need to introduce a "date of last purchase" field to a customer record. This sequence could be used:
Add the new field to the customer record in the database (without updating the application). Set the new field default value as NULL.
Update the application so that for each new sale, the date of last purchase field is updated. For old sales the field is left unchanged, and the application at this point does not query or show the new field.
Execute a batch job on the database to update this field for all customers where it is still NULL. A delay could be introduced between updates so that the system is not overloaded.
Update the application to start querying and showing the new information.
There are several variations of this approach, such as the concept of "expansion scripts" and "contraction scripts" described in Zero-Downtime Database Deployment. This could be used along with feature toggles to change the application's behavior dinamically as the upgrade stages are executed.
New columns could be added to records to indicate that they have been converted. The application logic could be adapted to deal with records in the old version and in the new version concurrently.
The Entity Framework may impose some additional limitations in the options, because it generates the SQL statements on behalf of the application, so you would have to take that into consideration when planning the stages.
Staging environment
Changing the production database structure and executing mass data changes is risky business, especially when it must be done in a specific sequence while data is being entered and changed by users. Your options to revert mistakes can be severely limited.
It would be necessary to do extensive testing and simulation in a separate staging environment before executing the upgrade procedures on the production environment.
I agree with the maintenance window idea from Fernando. But here is the approach I would take given your question.
Make sure your database is backed up before doing anything (I am assuming its SQL Azure)
Put up a maintenance page on the Client Application
Run the migration via Visual Studio to your database(I am assuming you are doing this through the console) or a unit test
Publish the website/web api websites
Verify your changes.
The main thing is working with the seed method via Entity Framework is that its easy to get it wrong and without a proper backup while running against Prod you could get yourself in trouble real fast. I would probably run it through your test database/environment first (if you have one) to verify what you want is happening.

Entity Framework migrations on legacy database

We have several legacy SQL Server databases that we occasionally make schema changes to. We currently have a utility written in C++ that allows users to update their DB's with these schema changes. The utility currently generates dynamic sql to create all DB objects. I am looking into redoing this and thought EF migrations might be a good way to go. I have read up a bit on the subject and I have a general idea of how it works. But I'm having a bit of a hard time figuring out how I would set it up to replace our current procedure (or if it is even possible). Currently, a client could be on any one of a number of previous versions. I'm assuming I would have to go back to the oldest possible version and create my model/initial migration from that, then generate incremental migrations for each version change in order to support updates from all versions. Is that a correct assumption? Also, currently our clients could be using sql server 2000, 2005, or 2008. Would this have any effect on how I would set things up (or if I even could)? Further, the goal is to create a utility with a (C# - probably WPF) UI that the user can use to manipulate the migrations (up or down, preferably). I've seen a lot of examples of how to manipulate migrations from command-line within package manager but not a lot of stuff on how to create a utility with a friendly UI for upgrading/downgrading DB's in production. Also, I have not seen anything that shows how to create stored procedures in a migration (our DBs rely on some stored procedures). I'm assuming that, if nothing else, I can use the Sql() method to generate a SQL query to create a SP. Is that correct? Is there a better way?
I know my questions are a bit non-specific and I apologize for that. But I'm still in the beginning processes of learning this and I'd like to get an idea of whether or not this is a good way to go. Any guidance would be greatly appreciated.
Thanks,
Dennis
Firstly, on SQL Server support, Entity Framework doesn't really support SQL Server 2000. See this question:
EntityFramework SQL Server 2000?
On the question of supporting all the multiple versions, you have the right idea about needing to generate an initial migration for the oldest version first then incrementally altering the model and generating migrations to support the later versions. This will be a pain as the migrations are opinionated about how they represent the model in the database and you will be doing a lot of messing about to end up with a model and a set of migrations that fully represent that. Specific concerns are indexes, column lengths, data types, stored procedures, triggers, functions, partitioning.
The Sql() function gets you around most issues, though also helpful in the migrations are functions like CreateIndex and AlterColumn.
For automating this, the migrations are definitely available as powershell cmdlets which are themselves just .Net objects so can be called programmatically.
As this question is a year old, I assume you will have made a decision on whether to do this. My opinion is that it is hard to see that it's worth the effort. If you were re-platforming the code base that uses this database to Entity Framework then it would make sense. Otherwise there are bound to be better tools out there for database version management. My first port of call would be Redgate.

node-mongo-native migration framework

I'm working on a node.js server, and using MongoDB with node-mongo-native.
I'm looking for a db migration framework, similar to Rails migrations. Any recommendations?
I'm not aware of a specific native Node.js tool for doing MongoDB migrations .. but you do have the option of using tools written in other languages (for example, Mongoid Rails Migrations).
It's worth noting that the approach to Schema design and data modelling in MongoDB is different from relational databases. In particular, there is no requirement for a collection to have a consistent or predeclared schema so many of the traditional migration actions such as adding and removing columns are not required.
However .. migrations which involve data transformations can still be useful.
If your application is expecting data to be in a certain format (eg. you want to split a "name" field into "first name" and "last name") there are several strategies you could use if the idea of using migration tools written in another programming language isn't appealing:
handle data differences in your application logic, so old and new data formats are both acceptable (perhaps "upgrading" records to match a newer format as they are updated)
write a script to do a once off data migration
contribute MongoDB helpers to node-migrate
I've just finished writing a basic migration framework based on node-mongo-native: https://github.com/afloyd/mongo-migrate. It will allow you to migrate up & down, as well as migrating up/down to a specific revision number. It was initially based on node-migrate, but obviously needed to be changed a bit to make it work.
The revision history is stored in mongodb and not on the file system like node-migrate, allowing collaboration on the same project using a single database. Otherwise each developer running migrations could cause migrations to run more than once against a database.
The migrations themselves are file-based, also helping with collaboration on a single project where each developer is (or is not) not using the same database. So when each dev runs the migration, all migration files not already run against his/her database will be run.
Check out the documentation for more info.

Resources