How to run a migration on Google App Engine

How to run a migration on Google App Engine - node.js

I have a Node.js app running on Google App Engine.
I want to run sequelize migrations.
Is it possible to run a command from within the Instance of my node.js app?
Essentially something like heroko's run command which will run a one-off process inside a Heroku dyno.
If this isn't possible what's the best practice in running migrations?
I could always just add it to the gcp-build but this will run on every deploy.

It's not possible to run standalone scripts/apps in GAE, see How do I run custom python script in Google App engine (in python context, but the general idea applies to all runtimes).
The way I ran my (datastore) migrations was to port the functionality of the migration script itself into the body of an admin-protected handler in my GAE app which I triggered with a HTTP request for a particular URL. I re-worked it a bit to split the potentially long-running migration operation into a sequence of smaller operations (using push task queues), much more GAE-friendly. This allowed me to live-test the migration one datastore entity set at a time and only go for multiple sets when completely confident with its operation. Also didn't have to worry about eventual consistency (I was using queries to determine the entities to be migrated) - I just repeatedly invoked the migration until there was nothing left to do.
Once the migration was completed I removed the respective code (but kept the handler itself for future migrations). As a positive side effect I pretty much had the migration history captured in my repository's history itself.
Potentially of interest: Handling Schema Migrations in App Engine

Related

How to reload tensorflow model in Google Cloud Run server?

I have a webserver hosted on cloud run that loads a tensorflow model from cloud file store on start. To know which model to load, it looks up the latest reference in a psql db.
Occasionally a retrain script runs using google cloud functions. This stores a new model in cloud file store and a new reference in the psql db.
Currently, in order to use this new model I would need to redeploy the cloud run instance so it grabs the new model on start. How can I automate using the newest model instead? Of course something elegant, robust, and scalable is ideal, but if something hacky/clunky but functional is much easier that would be preferred. This is a throw-away prototype but it needs to be available and usable.
I have considered a few options but I'm not sure how possible either of them are:
Create some sort of postgres trigger/notification that the cloud run server listens to. Guess this would require another thread. This ups complexity and I'm unsure how multiple threads works with Cloud Run.
Similar, but use a http pub/sub. Make an endpoint on the server to re-lookup and get the latest model. Publish on retrainer finish.
could deploy a new instance and remove the old one after the retrainer runs. Simple in some regards, but seems riskier and it might be hard to accomplish programmatically.

Your current pattern should implement cache management (because you cache a model). How can you invalidate the cache?
Restart the instance? Cloud Run doesn't allow you to control the instances. The easiest way is to redeploy a new revision to force the current instance to stop and new ones to start.
Setting a TTL? It's an option: load a model for XX hours, and then reload it from the source. Problem: you could have glitches (instances with new models and instances with the old one, up to the cache TTL expires for all the instances)
Offering cache invalidation mechanism? As said before, it's hard because Cloud Run doesn't allow you to communicate with all the instances directly. So, push mechanism is very hard and tricky to implement (not impossible, but I don't recommend you to waste time with that). Pull mechanism is an option: check a "latest updated date" somewhere (a record in Firestore, a file in Cloud Storage, an entry in CLoud SQL,...) and compare it with your model updated date. If similar, great. If not, reload the latest model
You have several solutions, all depend on your wish.
But you have another solution, my preference. In fact, every time that you have a new model, recreate a new container with the new model already loaded in it (with Cloud Build) and deploy that new container on Cloud Run.
That solution solves your cache management issue, and you will have a better cold start latency for all your new instances. (In addition of easier roll back, A/B testing or canary release capability, version management and control, portability, local/other env testing,...)

Can I manually run scripts on App Engine?

Is there a way to run something like "node testscript.js" remotely?
If not, how do you test particular functions on App Engine? I can test them locally, but there are difference when running on App Engine.

If you want to run something in App Engine, you will have to deploy it, and whenever you make changes to the source code, you will have to redeploy it again to be able to run the updated code on App Engine. You should test your application in local thoroughly to be sure it will be working as expected when deployed.
With respect to the timeouts, please keep in mind that there are two environments: flexible and standard, where the timeout deadlines differ (60 sec for standard vs 60 min for flexible). Also, you can have long-running requests on App Engine standard if you use the manual scaling option.
You might also look at Cloud Functions, depending on what your scripts do. Some of the options to trigger Cloud Functions are HTTP requests or Direct Triggers.

Heroku workers in dev

I'm looking into using a worker as well as a web for the first time as I have to scrape a website. I'm just wondering before I commit to this about working in a dev environment. How do jobs in a queue get handled when I'm testing my app before it's pushed to Heroku?
I will probably be using RabbitMQ if that's relevant here.

I guess it depends on what you mean by testing. You can unit test the code that does the scraping in isolation from any queue, and you can provide a mock implementation of the queue operations to handle a goodly portion of your integration tests.
I suppose you might want a real instance of the queue for certain tests, but depending on the nature of your project, you might be satisfied with the sorts of tests described in the first paragraph.
If you simply must test the queue operation and/or you want to run a complete copy of production locally then you'll have to stand up an instance of Rabbitmq. You can stand one up locally or use one of the SAAS providers.
If you have multiple developers working on the project, you might want to make it easy for them by creating something like a vagrant script that sets up a complete environment in a vm. Or better still something like docker. Doing so also gives you a lot more deployment options (making you less dependent on the heroku tooling).
Lastly, numerous CI solutions like Travis CI provide instances of popular services for running tests (including rabbit).

Foxx apps debugging workflow?

What is the recommended workflow to debug Foxx applications?
I am currently working on a pretty big application and it seems to me I am doing something wrong, because the way I am proceeding does not seem to be maintanable at all:
Do your changes in Foxx app (eg new endpoints).
Upload your foxx app to ArangoDB.
Test your changes (eg trigger API calls).
Check the logs to see if something went wrong.
Go to 1.

i experienced great time savings, shifting more of the development workflow to the terminal client 'arangosh'. Especially when debugging more complex endpoints, you can isolate queries and functions and debug each individually in the terminal. When done debugging, you merge your code in Foxx app and mount it. Require modules as you would do in Foxx, just enter variables as arguments for your functions or queries.
You can use arangosh either directly from the terminal or via the embedded terminal in the Arangodb frontend.
You may also save some time switching to dev mode, which allows you to have changes in your code directly reflected in the mounted app without fetching, mounting and unmounting each time.
That additional flexibility costs some performance, so make sure to switch back to production mode once your Foxx app is ready for deployment.

When developing a Foxx App, I would suggest using the development mode. This also helps a lot with debugging, as you have faster feedback. This works as follows:
Start arangod with the dev-app-path option like this: arangod --javascript.dev-app-path /PATH/TO/FOXX_APPS /PATH/TO/DB, where the path to foxx apps is the folder that contains a database folder that contains your foxx apps sorted by database. More information can be found here.
Make your changes, no need to deploy the app or anything. The app now automatically reloads on every request. Change, try out, change try out...
There's currently no debugging capabilities. We are planning to add more support for unit testing of Foxx apps in the near future, so you can have a more TDD-like workflow.

FluentMigrator Migration From Application_Start

I am currently changing our database deployment strategy to use FluentMigration and have been reading up on how to run this. Some people have suggested that it can be run from Application_Start, I like this idea but other people are saying no but without specifying reasons so my questions are:
Is it a bad idea to run the database migration on application start and if so why?
We are planning on moving our sites to deploying to azure cloud services and if we don't run the migration from application_start how should/when should we run it considering we want to make the deployment as simple as possible.
Where ever it is run how do we ensure it is running only once as we will have a website and multiple worker roles as well (although we could just ensure the migration code is only called from the website but in the future we may increase to 2 or more instances, will that mean that it could run more than once?)
I would appreciate any insight on how others handle the migration of the database during deployment, particularly from the perspective of deployments to azure cloud services.
EDIT:
Looking at the comment below I can see the potential problems of running during application_start, perhaps the issue is I am trying to solve a problem with the wrong tool, if FluentMigrator isn't the way to go and it may not be in our case as we have a large number of stored procedures, views, etc. so as part of the migration I was going to have to use SQL scripts to keep them at the right version and migrating down I don't think would be possible.
What I liked about the idea of running during Application_Start was that I could build a single deployment package for Azure and just upload it to staging and the database migration would be run and that would be it, rather thank running manual scripts, and then just swap into production.

Running migrations during Application_Start can be a viable approach. Especially during development.
However there are some potential problems:
Application_Start will take longer and FluentMigrator will be run every time the App Pool is recycled. Depending on your IIS configuration this could be several times a day.
if you do this in production, users might be affected i.e. trying to access a table while it is being changed will result in an error.
DBA's don't usually approve.
What happens if the migrations fail on startup? Is your site down then?
My opinion ->
For a site with a decent amount of traffic I would prefer to have a build script and more control over when I change the database schema. For a hobby (or small non-critical project) this approach would be fine.

An alternative approach that I've used in the past is to make your migrations non-breaking - that is you write your migrations in such a way they can be deployed before any code changes and work with the existing code. This way both code and migrations both can be deployed independently 95% of the time. For example instead of changing an existing stored procedure you create a new one or if you want to rename a table column you add a new one.
The benefits of this are:
Your database changes can be applied before any code changes. You're then free to roll back any breaking code changes or breaking migrations.
Breaking migrations won't take the existing site down.
DBAs can run the migrations independently.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string