I am currently changing our database deployment strategy to use FluentMigration and have been reading up on how to run this. Some people have suggested that it can be run from Application_Start, I like this idea but other people are saying no but without specifying reasons so my questions are:
Is it a bad idea to run the database migration on application start and if so why?
We are planning on moving our sites to deploying to azure cloud services and if we don't run the migration from application_start how should/when should we run it considering we want to make the deployment as simple as possible.
Where ever it is run how do we ensure it is running only once as we will have a website and multiple worker roles as well (although we could just ensure the migration code is only called from the website but in the future we may increase to 2 or more instances, will that mean that it could run more than once?)
I would appreciate any insight on how others handle the migration of the database during deployment, particularly from the perspective of deployments to azure cloud services.
EDIT:
Looking at the comment below I can see the potential problems of running during application_start, perhaps the issue is I am trying to solve a problem with the wrong tool, if FluentMigrator isn't the way to go and it may not be in our case as we have a large number of stored procedures, views, etc. so as part of the migration I was going to have to use SQL scripts to keep them at the right version and migrating down I don't think would be possible.
What I liked about the idea of running during Application_Start was that I could build a single deployment package for Azure and just upload it to staging and the database migration would be run and that would be it, rather thank running manual scripts, and then just swap into production.
Running migrations during Application_Start can be a viable approach. Especially during development.
However there are some potential problems:
Application_Start will take longer and FluentMigrator will be run every time the App Pool is recycled. Depending on your IIS configuration this could be several times a day.
if you do this in production, users might be affected i.e. trying to access a table while it is being changed will result in an error.
DBA's don't usually approve.
What happens if the migrations fail on startup? Is your site down then?
My opinion ->
For a site with a decent amount of traffic I would prefer to have a build script and more control over when I change the database schema. For a hobby (or small non-critical project) this approach would be fine.
An alternative approach that I've used in the past is to make your migrations non-breaking - that is you write your migrations in such a way they can be deployed before any code changes and work with the existing code. This way both code and migrations both can be deployed independently 95% of the time. For example instead of changing an existing stored procedure you create a new one or if you want to rename a table column you add a new one.
The benefits of this are:
Your database changes can be applied before any code changes. You're then free to roll back any breaking code changes or breaking migrations.
Breaking migrations won't take the existing site down.
DBAs can run the migrations independently.
Related
For about 18 months now I've been working in Node; and for the last 6 months I've been slowly migrating my existing WordPress websites to NextJS.
To date, I've been deploying to production manually. I log into my production server, checkout the latest release from GitHub, build, and do a pm2 restart.
Even though the above workflow seems to be the most commonly documented around the internet, it's always felt a little wrong to me.
Recently, I found myself in a situation where I needed to customise some 3rd party code. So, my main code now has a line in package.json that says
{
...
"dependencies": {
...
"react-share": "file:../react-share/react-share-4.4.1.tgz",
...
},
...
}
which implies that I'm going to checkout my custom react-share, build it somewhere on the production server, change this line to point to wherever I put it, and then rebuild.
Also, I'm using Prisma, which means that every time I deploy, before I do a build, I need to do an npx prisma generate to create the client.
This now all seems really, really wrong.
I don't know how a "simple" CI/CD environment might look, but whatever it looks like, it feels like overkill. It's just me doing development, and my production environment is a single EC2 server sitting behind AWS CloudFront.
It seems to me that I should be doing something more/different than what I'm currently doing, in service to someday moving to a CI/CD model, if/when I have a whole team working on this, or sufficient users that I have multiple load-balanced servers and need production to be continually up.
In particular, it feels like I shouldn't be building on the production server.
Are there any intermediary step(s) I can/should be taking for faster/less-error-prone/less-down-time deployment to a single EC2 instance for Next/Node apps, between manually deploying as I am currently, and some sort of CI/CD setup? Or are my only choices to do what I'm doing now, or go research how to do CI/CD?
You're approaching towards your initial stages of what technically is called DevOps, if not already as it appears from your context. What you're asking is a broad topic, which is an understatement, and explaining each and everything here will almost be like writing an article about it, at the very least.
However, I'll brief you overall on how to approach with this.
I don't know how a "simple" CI/CD environment might look, but whatever it looks like, it feels like overkill.
Simplicity & complexity are relative terms. A system which is complicated for one might be simple for another. CI/CD doesn't define any laws that you need to follow in order to create a perfect deployment procedure, as everyone's deployment requirement is unique (at some point).
If I mention it in bullet points, what you need to figure out before you start with setting up CI&CD, is -
The sequence of steps your deployment procedure needs in order to deploy your latest version. As you have stated already that you've been doing deployment manually, that means you already know your steps. All you need to do is to fine-tune each step so that it shouldn't require manual intervention while being executed automatically by the CI program.
Choose a CI program, like Travis CI, Circle CI, or if you're using GitHub, it has it's own GitHub Actions for the purpose, you can read their documentation for more details. Your CI program will be responsible for executing your deployment steps which you'll mention to it in whichever format it understands (mostly .yml).
The CI program will execute your steps on behalf of you based on the condition which you'll provide, (like when code is pushed on prod branch). It will execute the commands on a machine (like your EC2), specifically, GitHub actions runner will be responsible for running your commands on your machine, the runner should be setup beforehand in the instance you intend to deploy your code on. More details on runners can be found in relevant documentations.
Since the runner will actually execute the commands on your machine, make sure that all required commands and parameters, including the concerned files & directories are accessible to the runner program, from permissions point of view at least. For example, running your npx prisma generate command should require that npx command is available and executable in the system, and the concerned folders in which the command will CRUD files is accessible by the runner program. Similarly for all other commands.
Get your hands on bash scripting as well.
If your steps contain dynamic info, like the one you mentioned that in your package.json an npm script needs to be updated, then a custom bash script created to update the same automatically will help, for instance. There will be however, several other ways depending on the specific nature of the dynamic changes.
The above points are huge (by huge, I mean astronomically huge) oversimplification of the ways through which CI&CD pipelines are setup. But I hope you get the idea of it at least.
In particular, it feels like I shouldn't be building on the production server.
Your feeling is legitimate. You should replicate your production environment (including deployment procedures) into a separate development environment as close as possible, in order to have all your experiments, development and testing done separately from production environment, and after successful evaluation on the development environment, deploy on production one. Steps like building will most likely be done on both environments, as it is something your program needs to run, irrespective of the environment it is running in. Your future team will appreciate this separation of environments.
if/when I have a whole team working on this, or sufficient users that I have multiple load-balanced servers and need production to be continually up.
Again, this small statement in itself is a proper domain of IT department, known as System Design, in which, to put it simply, you or your team will create an architecture for your whole system which will support your business requirements and scaling as your audience increases, which is something a simple Stackoverflow QnA won't suffice to explain.
Therefore,
or go research how to do CI/CD?
is what I'd recommend and you should also feel is the right way ahead, after reading everything above.
Useful references to begin with (not endorsing any resources, you can search for relevant/better resources too)
GitHub Actions self-hosted runners
System Design - Getting started
Bash scripting
Development, Staging, Production
I have a webserver hosted on cloud run that loads a tensorflow model from cloud file store on start. To know which model to load, it looks up the latest reference in a psql db.
Occasionally a retrain script runs using google cloud functions. This stores a new model in cloud file store and a new reference in the psql db.
Currently, in order to use this new model I would need to redeploy the cloud run instance so it grabs the new model on start. How can I automate using the newest model instead? Of course something elegant, robust, and scalable is ideal, but if something hacky/clunky but functional is much easier that would be preferred. This is a throw-away prototype but it needs to be available and usable.
I have considered a few options but I'm not sure how possible either of them are:
Create some sort of postgres trigger/notification that the cloud run server listens to. Guess this would require another thread. This ups complexity and I'm unsure how multiple threads works with Cloud Run.
Similar, but use a http pub/sub. Make an endpoint on the server to re-lookup and get the latest model. Publish on retrainer finish.
could deploy a new instance and remove the old one after the retrainer runs. Simple in some regards, but seems riskier and it might be hard to accomplish programmatically.
Your current pattern should implement cache management (because you cache a model). How can you invalidate the cache?
Restart the instance? Cloud Run doesn't allow you to control the instances. The easiest way is to redeploy a new revision to force the current instance to stop and new ones to start.
Setting a TTL? It's an option: load a model for XX hours, and then reload it from the source. Problem: you could have glitches (instances with new models and instances with the old one, up to the cache TTL expires for all the instances)
Offering cache invalidation mechanism? As said before, it's hard because Cloud Run doesn't allow you to communicate with all the instances directly. So, push mechanism is very hard and tricky to implement (not impossible, but I don't recommend you to waste time with that). Pull mechanism is an option: check a "latest updated date" somewhere (a record in Firestore, a file in Cloud Storage, an entry in CLoud SQL,...) and compare it with your model updated date. If similar, great. If not, reload the latest model
You have several solutions, all depend on your wish.
But you have another solution, my preference. In fact, every time that you have a new model, recreate a new container with the new model already loaded in it (with Cloud Build) and deploy that new container on Cloud Run.
That solution solves your cache management issue, and you will have a better cold start latency for all your new instances. (In addition of easier roll back, A/B testing or canary release capability, version management and control, portability, local/other env testing,...)
I've update production deployment yesterday morning then I've made changes to service files using remote connection
add and update files and everything was OK.
today morning all the changes I've done after deployment was undone and customers use the old version and this cost us hundreds of thousand of pounds
i need to know what's happen nothing appeared in operations log
Probably what has happened is that Microsoft has updated your servers at the Cloud Centre and re-deployed your application from the original deployment. This is in their terms and conditions, you should not make any important manual changes to the deployment after it is deployed unless they are stored in the portal (environment settings etc.), otherwise they might be lost during updates or reboots.
I learned this the hard way too. I had a cache role with only one instance (I thought it only made sense with one instance) and while updates happened, my whole site went down several times over several days!
PaaS services are stateless, which means the VMs running your service can be destroyed and recreated at any time, at which point the VM will be recreated with the content from your original .cspkg.
For more information see http://blogs.msdn.com/b/kwill/archive/2012/09/19/role-instance-restarts-due-to-os-upgrades.aspx and http://blogs.msdn.com/b/kwill/archive/2012/10/05/windows-azure-disk-partition-preservation.aspx.
As others have said, PaaS Web Roles are stateless. If you're making manual configuration changes to your deployed solution package after it has been auto-deployed then any re-deployment by the Azure fabric will simply deploy the package minus your manual changes. To solve this issue you could use startup tasks to apply your manual changes using a PowerShell script or similar (depending on what you're changing). See http://msdn.microsoft.com/en-us/library/jj129544.aspx.
Note that startup tasks don't just run when a machine gets re-imaged or rebooted.
We are using Azure Blob Storage in all our projects. Through lifetime of a project the naming convention for files in Azure change: sometimes we would like to rename containers, remove extra folders and other clean-up operations.
But Azure does not allow easily to rename things, we have to do copy-delete.
Also we can change naming convention locally, during development. But we need to remember do the exact operation on production storage when we deploy new versions.
At the same time we use Entity Framework migrations: we updated database, migration script is created. Then we run "update-database" and DB is updated. The same is run automatically by deployment scripts: check if production DB needs to be updated, and update it if needed.
What would be good if we can do the same migration goodness for Azure storage: check if all the migration scripts have been applied, execute processes for missing scripts. Somewhere in the containers keep a reference to a latest executed script.
Does such thing exist? or should I have a go on it and try implementing something myself.
No, such functionality/behavior does not exists. And do remember that EF migrations are supported and are part of the EF itself, not the Data Base! So when you talk about Azure Blob Storage - it, as a service does not provide such functionality, the same way SQL Server itself does not do it.
To the question if such a library/code exists - no there isn't.
You are raising a very interesting question though!
I personally am not a big fan of "migrations". You can do it while in early stages of development life cycle. But once you hit GA/Production, you have to be very careful what you are doing. Even EF migrations might be good with small database sizes, but are you willing to run migrations on a DB which has tables with millions of records production data? Same with blobs. If you have 100 or 1000 blobs might be fine. How about 2M blobs? Are you really willing to put some code that would go through 2M entities and do some operations over it, and run this code as part of your build/deploy process? I would not.
I have already spending a lot of time googling for some solution but I'm helpless !
I got an MVC application and I'm trying to do "integration testing" for my Views using Coypu and SpecFlow. But I don't know how I should manage IIS server for this. Is there a way to actually run the server (first start of tests) and making the server use a special "test" DB (for example an in-memory RavenDB) emptied after each scenario (and filled during the background).
Is there a better or simpler way to do this?
I'm fairly new to this too, so take the answers with a pinch of salt, but as noone else has answered...
Is there a way to actually run the server (first start of tests) ...
You could use IIS Express, which can be called via the command line. You can spin up your website before any tests run (which I believe you can do with the [BeforeTestRun] attribute in SpecFlow) with a call via System.Diagnostics.Process.
The actual command line would be something like e.g.
iisexpress.exe /path:c:\iisexpress\<your-site-published-to-filepath> /port:<anyport> /clr:v2.0
... and making the server use a special "test" DB (for example an in-memory RavenDB) emptied after each scenario (and filled during the background).
In order to use a special test DB, I guess it depends how your data access is working. If you can swap in an in-memory DB fairly easily then I guess you could do that. Although my understanding is that integration tests should be as close to production env as possible, so if possible use the same DBMS you're using in production.
What I'm doing is just doing a data restore to my test DB from a known backup of the prod DB, each time before the tests run. I can again call this via command-line/Process before my tests run. For my DB it's a fairly small dataset, and I can restore just the tables relevant to my tests, so this overhead isn't too prohibitive for integration tests. (It wouldn't be acceptable for unit tests however, which is where you would probably have mock repositories or in-memory data.)
Since you're already using SpecFlow take a look at SpecRun (http://www.specrun.com/).
It's a test runner which is designed for SpecFlow tests and adds all sorts of capabilities, from small conveniences like better formatting of the Test names in the Test Explorer to support for running the same SpecFlow test against multiple targets and config file transformations.
With SpecRun you define a "Profile" which will be used to run your tests, not dissimilar to the VS .runsettings file. In there you can specify:
<DeploymentTransformation>
<Steps>
<IISExpress webAppFolder="..\..\MyProject.Web" port="5555"/>
</Steps>
</DeploymentTransformation>
SpecRun will then start up an IISExpress instance running that Website before running your tests. In the same place you can also set up custom Deployment Transformations (using the standard App.Config transformations) to override the connection strings in your app's Web.config so that it points to the in-memory DB.
The only problem I've had with SpecRun is that the documentation isn't great, there are lots of video demonstrations but I'd much rather have a few written tutorials. I guess that's what StackOverflow is here for.