XPages - Process to update production with minimal down time

XPages - Process to update production with minimal down time - xpages

Is there a best practice process to update an xpages application in production?
At the moment (after testing against a development server) I will..
Refresh a local replica of the production db with the dev template
Build the local replica
Replicate back to production
If I follow that process the production system will go down for about 20 minutes after replication responding with a "HTTP Web Server: Command Not Handled Exception error." It will come back by itself eventually, reflecting the new change.
So the question(s) are:
Is there a 'better' way to update production which would minimise downtime
If not, any ideas what the server could be doing for those 20 mins? there are no application scope beans to re-initialise, only sessionScope ones. In the logs there are lots of these errors...
5/13/14 3:55 PM: Exception Thrown
java.lang.NullPointerException
at com.ibm.xsp.webapp.FacesServletEx.service(FacesServletEx.java:88)
at com.ibm.xsp.webapp.DesignerFacesServlet.service(DesignerFacesServlet.java:103)
at com.ibm.designer.runtime.domino.adapter.ComponentModule.invokeServlet(ComponentModule.java:576)
at com.ibm.domino.xsp.module.nsf.NSFComponentModule.invokeServlet(NSFComponentModule.java:1335)
at com.ibm.designer.runtime.domino.adapter.ComponentModule$AdapterInvoker.invokeServlet(ComponentModule.java:853)
at com.ibm.designer.runtime.domino.adapter.ComponentModule$ServletInvoker.doService(ComponentModule.java:796)
at com.ibm.designer.runtime.domino.adapter.ComponentModule.doService(ComponentModule.java:565)
at com.ibm.domino.xsp.module.nsf.NSFComponentModule.doService(NSFComponentModule.java:1319)
at com.ibm.domino.xsp.module.nsf.NSFService.doServiceInternal(NSFService.java:662)
at com.ibm.domino.xsp.module.nsf.NSFService.doService(NSFService.java:482)
at com.ibm.designer.runtime.domino.adapter.LCDEnvironment.doService(LCDEnvironment.java:350)
at com.ibm.designer.runtime.domino.adapter.LCDEnvironment.service(LCDEnvironment.java:306)
at com.ibm.domino.xsp.bridge.http.engine.XspCmdManager.service(XspCmdManager.java:272)

Here are my thoughts.
I would NEVER have any replication going on between dev and production. That's just way too risky in my opinion and not a Best Practice. Maybe that's not exactly what you're doing I'm not sure.
Here's what I do and as far as I know everyone on my team:
Our Dev server has a test database. This is a COPY from production. Personally I refresh it now and then so it's basically a snapshot.
We have a local template for all dev work. We program in the template and refresh the test database. Typically we don't need to do a "restart task http" as part of the refresh process UNLESS we make a change to a managed bean or are working with SXD. Then yes we refresh the dev application and restart the server and test.
In the OLD days I would have a production template on the production server. This was not a replica of anything but it inherited design from the DEV template. So to promote updates to production I'd first refresh the production template from the dev version. Then refresh the production app from the production template.
In the world of source control and SourceTree where you have a feature branch, develop branch and default/production branch that kinda eliminates the need for a production template. I'm not in love with that but it is what it is. So we refresh production from our local templates and rely on SourceTree to make sure it's the correct branch at the time we refresh. I think it's a little more risky but this allows the ability to do real hotfixes and stuff.
Historically I've not needed to do a restart task http but I've not promoted anything that uses SXD and even my managed bean promotions have been limited to this point. But I imagine I will need to do more with restarting the http task.

In my normal use, I go the replace-design route on the production server. This still involves a bit of down time that scales with the size of the design and speed of connection to the server, but it's not too bad. That could be shortened if you did the design replace with a client running on the server itself.
I haven't done this, but my guess is that the absolute least down time would be to have the production DB follow a named NTF on the server. Do a replace design on that NTF and then run "load design -f proddb.nsf" on the server - I'd think that that would be the fastest way to bring the design elements in.
As for the "wait half an hour" problem, I'm not sure what the cause of that would be. I saw something similar on a client's server running 8.5.2 this week, but the delay was more on the order of a minute. I haven't seen anything of the like on my servers. One out-of-the-blue guess: maybe it's related to the "refresh application on design change" property added to xsp.properties in 8.5.3 (and which I've used since). That could explain the "fixes itself in half an hour" thing: the app could unload itself automatically in that period, whereas turning that option on would cause it to do so immediately.

Related

Live updating Node.js logging configuration

What I'm trying to accomplish:
Update the logging level on a node micro service, without stopping the existing running service, by detecting the saving of a change to the config file.
The reason: work policies demand different levels of approval based on what gets changed.
Updating a config file is a "standard change" (considered "safe", requiring low ceremony to accomplish.)
Changing the config file, and restarting the service is a "normal change" (considered "not safe", requiring vp approval).
This capability will go a long way towards allowing us to improve the logging in our services.
The technical challenge:
Both node-config and bunyan appear to require a restart in order to accept changes.
Results of attempting to do proper research prior to submitting a question:
Live updating Node.js server
Can node-config reload configurations without restarting Node?
(This last post has a solution that worked for someone, but I couldn't get it to work.)
In theory, I should be able to delete both the app level objects using the lines:
delete require.cache[require.resolve('config')];
delete require.cache[require.resolve('logging')];
and then recreate both objects with the new configurations read from the changed config file.
Deleting the config and logging object may work, but I'm still a node nube so every way that I attempted to use this magical line failed for me.
(Caveat: I was able to delete the objects I just couldn't get them recreated in such a way that my code would use the newly created objects.)
The horrifically ugly attempt of what I'm doing can be found on my github in the "spike_LiveUpdate" directory of:
https://github.com/MalcolmAnderson/uService_stub
The only thing we are attempting to change is the log level of the application so that, if we needed to, we could:
bounce the logging level "to 11" for a small handful of minutes,
have the user demonstrate their error,
and the put it back to the normal logging level.
(It would also be a useful tool to have in our back pocket so when the dealers of FUD say, "But if you get the logging level wrong, we will have to use an emergency change to fix it. Sorry, we appreciate your desire to get a better picture of what's happening in the application, but we just can't justify the risk you're proposing.")
Research prior to writing question:
Attempt to solve the problem.
Search for "node live update", "node live configuration update", "node no restart configuration change"
Search for the above but replacing the word "live update" with "rolling update" (this got a lot of Kubernetes hits, which may be the way I have to go with this.)
Bottom line: Attempting to be able to change both Node-config, and Bunyan logging without a restart.
[For all I know, there's a security reason for not allowing live updates.]

If I understand truly, you need to change log level of your application but without restart it
According to this, there are couple of ways I can suggest.
You can create a second server which includes the new deployment of your changed code and route new traffic to the new server and stop routing new traffic to the old server. When old server drains the processes it currently has, you can shut it down and go with a new one, or re-deploy your application for normal status.
You can create a second application layer in the same server with different port which I do not recommend it and do the same thing with first bullet that I mentioned.
If you're not running on Kubernetes, you need to make your own rolling update strategy (k8s is doing it automatically).

How do I get CRA on Azure to behave correctly w.r.t. serviceWorker caching?

I'm a little surprised that I can't find this answered already, but I've looked fairly hard. Apologies if I've missed it somewhere obvious.
TL;DR
Question: How do I configure CRA (with SW) and the Azure App Service into which it is deployed, so that they play nicely together?
There is a relatively well-discussed caching gotcha in the (current) default CreateReactApp config, relating to serviceWorkers.
It is lightly-discussed here, in the CRA docs: https://github.com/facebook/create-react-app/blob/master/packages/react-scripts/template/README.md#user-content-opting-out-of-caching (especially, points 5 & 6 from "Offline-First Considerations").
And an extended argument about its original inclusion in CRA can be found here: https://github.com/facebook/create-react-app/issues/2398
I'm not 100% certain of my understanding, so I'm going to start by summarising what I currently believe to be true:
CRA has setup standard cache-busting processes in its main files.
CRA has included a "Service Worker" pattern to do heavy-duty, ultra-sticky caching.
The Service Worker can mostly load the page without even having an internet connection.
In principle the SW will load the page, and then go and ask the server about the latest code.
If it detects a newer version, it will download that, and cache the new version.
It will NOT immediately display the new version, but would use the new version the next time the page is loaded, from its newly-updated cache.
All this is the standard, desired, intended behaviour of Service Workers.
But further to this:
The serviceWorker is NOT cache busted.
Depending on your host server configuration, the serviceWorker itself may get cached.
If that happens, then it will permanently display the stale version of the site, until the serviceWorker is cleared from the cache, and then it will start trying to update the available content.
Default, out-of-the-box Azure configuration does cache the SW, and so far (after 24 hours) doesn't seem to release that cache at all :(
Question 0: Have I got everything above correct, or am I missing something?
I like the sound of the serviceWorker in general - it seems like it's doing a useful job, but it seems very unlikely that the CRA has provided something which when installed Out-of-box into Azure is fundamentally broken - the site I've deployed at the moment is (or gives the impression of being) impossible to push updates to!
I believe I know how to turn off the ServiceWorker, apparently including active steps required to purge it from people's browsers who have already seen it once and thus have an active SW already. (Though I don't understand clearly how that code will work!?) But I'd rather not do that if I can avoid it - I'd rather understand how to work WITH this feature, than how to DISABLE this feature.
So ...
Question: How do I configure CRA (with SW) and the Azure App Service into which it is deployed, so that they play nicely together?

When should an Azure website be restarted, and what are the consequences?

In the Azure Management Portal, you can configure your website. As an example, you can change the PHP version your website is using. When you have edited a configuration option, you have to click “Save”.
So far, so good. But you also have the option to restart your site (by clicking “Restart“ next to “Save”).
My question is, when should you restart your website? Are there some configuration changes that require a restart, and others that don't? I haven't found any hints in the user interface.
Are there other situations that require a restart? Say, the website has been running for a given time without a restart?
Also, what are the consequences of restarting a website? Does it affect cookies/sessions in any way (i.e. delete a user's shopping cart or log them out)? Are there any other consequences I should be aware of?

Generally speaking, you may want to restart your website because of application performance issues. For example, you may have a memory leak in your application, connections not getting closed, or other things that would degrade the performance of the application over time. As you monitor your website and observe conditions like this you may make a decision to restart it. Even better, you may even automate the task of restarting when these conditions occurr. Anyway, these kinds of things are not unique to Azure Websites. You would take similar actions for a website running on-premises.
As for configuration changes, if you make a change to your web.config file, this change is detected and your website would be restarted automatically for you. Similarily, if you were to make configuration changes in the CONFIG page of your website in the Azure Management Portal such as application settings, connection strings, etc., then Azure Websites will detect this change to your environment and automatically restart it.
Indeed, restarting a website will result in any session data kept in memory being lost for that instance. Additionally, if you have startup/initialization code that takes time to complete then that will have to be rerun. Again, this is not anything unique to Azure Websites though.

FluentMigrator Migration From Application_Start

I am currently changing our database deployment strategy to use FluentMigration and have been reading up on how to run this. Some people have suggested that it can be run from Application_Start, I like this idea but other people are saying no but without specifying reasons so my questions are:
Is it a bad idea to run the database migration on application start and if so why?
We are planning on moving our sites to deploying to azure cloud services and if we don't run the migration from application_start how should/when should we run it considering we want to make the deployment as simple as possible.
Where ever it is run how do we ensure it is running only once as we will have a website and multiple worker roles as well (although we could just ensure the migration code is only called from the website but in the future we may increase to 2 or more instances, will that mean that it could run more than once?)
I would appreciate any insight on how others handle the migration of the database during deployment, particularly from the perspective of deployments to azure cloud services.
EDIT:
Looking at the comment below I can see the potential problems of running during application_start, perhaps the issue is I am trying to solve a problem with the wrong tool, if FluentMigrator isn't the way to go and it may not be in our case as we have a large number of stored procedures, views, etc. so as part of the migration I was going to have to use SQL scripts to keep them at the right version and migrating down I don't think would be possible.
What I liked about the idea of running during Application_Start was that I could build a single deployment package for Azure and just upload it to staging and the database migration would be run and that would be it, rather thank running manual scripts, and then just swap into production.

Running migrations during Application_Start can be a viable approach. Especially during development.
However there are some potential problems:
Application_Start will take longer and FluentMigrator will be run every time the App Pool is recycled. Depending on your IIS configuration this could be several times a day.
if you do this in production, users might be affected i.e. trying to access a table while it is being changed will result in an error.
DBA's don't usually approve.
What happens if the migrations fail on startup? Is your site down then?
My opinion ->
For a site with a decent amount of traffic I would prefer to have a build script and more control over when I change the database schema. For a hobby (or small non-critical project) this approach would be fine.

An alternative approach that I've used in the past is to make your migrations non-breaking - that is you write your migrations in such a way they can be deployed before any code changes and work with the existing code. This way both code and migrations both can be deployed independently 95% of the time. For example instead of changing an existing stored procedure you create a new one or if you want to rename a table column you add a new one.
The benefits of this are:
Your database changes can be applied before any code changes. You're then free to roll back any breaking code changes or breaking migrations.
Breaking migrations won't take the existing site down.
DBAs can run the migrations independently.

Deploying updates to production node.js code

This may be a basic question, but how do I go about effeciently deploying updates to currently running node.js code?
I'm coming from a PHP, JavaScript (client-side) background, where I can just overwrite files when they need updating and the changes are instantly available on the produciton site.
But in node.js I have to overwrite the existing files, then shut-down and the re-launch the application. Should I be worried by potential downtime in this? To me it seems like a more risky approach than the PHP (scripting) way. Unless I have a server cluster, where I can take down one server at a time for updates.
What kind of strategies are available for this?

In my case it's pretty much:
svn up; monit restart node
This Node server is acting as a comet server with long polling clients, so clients just reconnect like they normally would. The first thing the Node server does is grab the current state info from the database, so everything is running smoothly in no time.
I don't think this is really any riskier than doing an svn up to update a bunch of PHP files. If anything it's a little bit safer. When you're updating a big php project, there's a chance (if it's a high traffic site it's basically a 100% chance) that you could be getting requests over the web server while you're still updating. This means that you would be running updated and out-of-date code in the same request. At least with the Node approach, you can update everything and restart the Node server and know that all your code is up to date.

I wouldn't worry too much about downtime, you should be able to keep this so short that chances are no one will notice (kill the process and re-launch it in a bash script or something if you want to keep it to a fraction of a second).
Of more concern however is that many Node applications keep a lot of state information in memory which you're going to lose when you restart it. For example if you were running a chat application it might not remember who users were talking to or what channels/rooms they were in. Dealing with this is more of a design issue though, and very application specific.

If your node.js application 'can't skip a beat' meaning it is under continuous bombardment of incoming requests, you just simply cant afford that downtime of a quick restart (even with nodemon). I think in some cases you simply want a seamless restart of your node.js apps.
To do this I use naught: https://github.com/superjoe30/naught
Zero downtime deployment for your Node.js server using builtin cluster API

Some of the cloud hosting providers Node.js (like NodeJitsu or Windows Azure) keep both versions of your site on disk on separate directories and just redirect the traffic from one version to the new version once the new version has been fully deployed.
This is usually a built-in feature of Platform as a Service (PaaS) providers. However, if you are managing your servers you'll need to build something to allow for traffic to go from one version to the next once the new one has been fully deployed.
An advantage of this approach is that then rollbacks are easy since the previous version remains on the site intact.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string