How to reload tensorflow model in Google Cloud Run server? - python-3.x

I have a webserver hosted on cloud run that loads a tensorflow model from cloud file store on start. To know which model to load, it looks up the latest reference in a psql db.
Occasionally a retrain script runs using google cloud functions. This stores a new model in cloud file store and a new reference in the psql db.
Currently, in order to use this new model I would need to redeploy the cloud run instance so it grabs the new model on start. How can I automate using the newest model instead? Of course something elegant, robust, and scalable is ideal, but if something hacky/clunky but functional is much easier that would be preferred. This is a throw-away prototype but it needs to be available and usable.
I have considered a few options but I'm not sure how possible either of them are:
Create some sort of postgres trigger/notification that the cloud run server listens to. Guess this would require another thread. This ups complexity and I'm unsure how multiple threads works with Cloud Run.
Similar, but use a http pub/sub. Make an endpoint on the server to re-lookup and get the latest model. Publish on retrainer finish.
could deploy a new instance and remove the old one after the retrainer runs. Simple in some regards, but seems riskier and it might be hard to accomplish programmatically.

Your current pattern should implement cache management (because you cache a model). How can you invalidate the cache?
Restart the instance? Cloud Run doesn't allow you to control the instances. The easiest way is to redeploy a new revision to force the current instance to stop and new ones to start.
Setting a TTL? It's an option: load a model for XX hours, and then reload it from the source. Problem: you could have glitches (instances with new models and instances with the old one, up to the cache TTL expires for all the instances)
Offering cache invalidation mechanism? As said before, it's hard because Cloud Run doesn't allow you to communicate with all the instances directly. So, push mechanism is very hard and tricky to implement (not impossible, but I don't recommend you to waste time with that). Pull mechanism is an option: check a "latest updated date" somewhere (a record in Firestore, a file in Cloud Storage, an entry in CLoud SQL,...) and compare it with your model updated date. If similar, great. If not, reload the latest model
You have several solutions, all depend on your wish.
But you have another solution, my preference. In fact, every time that you have a new model, recreate a new container with the new model already loaded in it (with Cloud Build) and deploy that new container on Cloud Run.
That solution solves your cache management issue, and you will have a better cold start latency for all your new instances. (In addition of easier roll back, A/B testing or canary release capability, version management and control, portability, local/other env testing,...)

Related

Build an extensible system for scraping websites

Currently, I have a server running. Whenever I receive a request, I want some mechanism to start the scraping process on some other resource(preferably dynamically created) as I don't want to perform scraping on my main instance. Further, I don't want the other instance to keep running and charging me when I am not scraping data.
So, preferably a system that I can request to start scraping the site and close when it finishes.
Currently, I have looked in google cloud functions but they have a cap at 9 min max for every function so it won't fit my requirement as scraping would take much more time than that. I have also looked in AWS SDK it allows us to create VMs on runtime and also close them but I can't figure out how to push my API script onto the newly created AWS instance.
Further, the system should be extensible. Like I have many different scripts that scrape different websites. So, a robust solution would be ideal.
I am open to using any technology. Any help would be greatly appreciated. Thanks
I can't figure out how to push my API script onto the newly created AWS instance.
This is achieved by using UserData:
When you launch an instance in Amazon EC2, you have the option of passing user data to the instance that can be used to perform common automated configuration tasks and even run scripts after the instance starts.
So basically, you would construct your UserData to install your scripts, all dependencies and run them. This would be executed when new instances are launched.
If you want the system to be scalable, you can lunch your instances in Auto Scaling Group and scale it up or down as you require.
The other option is running your scripts as Docker containers. For example using AWS Fargate.
By the way, AWS Lambda has limit of 15 minutes, so not much more than Google functions.

How to run a migration on Google App Engine

I have a Node.js app running on Google App Engine.
I want to run sequelize migrations.
Is it possible to run a command from within the Instance of my node.js app?
Essentially something like heroko's run command which will run a one-off process inside a Heroku dyno.
If this isn't possible what's the best practice in running migrations?
I could always just add it to the gcp-build but this will run on every deploy.
It's not possible to run standalone scripts/apps in GAE, see How do I run custom python script in Google App engine (in python context, but the general idea applies to all runtimes).
The way I ran my (datastore) migrations was to port the functionality of the migration script itself into the body of an admin-protected handler in my GAE app which I triggered with a HTTP request for a particular URL. I re-worked it a bit to split the potentially long-running migration operation into a sequence of smaller operations (using push task queues), much more GAE-friendly. This allowed me to live-test the migration one datastore entity set at a time and only go for multiple sets when completely confident with its operation. Also didn't have to worry about eventual consistency (I was using queries to determine the entities to be migrated) - I just repeatedly invoked the migration until there was nothing left to do.
Once the migration was completed I removed the respective code (but kept the handler itself for future migrations). As a positive side effect I pretty much had the migration history captured in my repository's history itself.
Potentially of interest: Handling Schema Migrations in App Engine

What is a good way to keep a small amount of state between Heroku deployments?

I have a small Node.js app running on Heroku that currently does not need or have any persistent storage of state.
However, I want to add a feature that requires that a few very small pieces of state (less than 1KB of data) be persisted between deployments.
What is the best way for me to add this state to my Heroku app, while still retaining the ability for this app to be easily deployed with the "Deploy to Heroku" button?
So far the only potential solution I see would involve attaching a free PostgreSQL addon, which seems like massive overkill.
Addons like Postgres and Redis can still be used with the Heroku Deploy button. Check out this example for how initial table building (via rake and stuff) work: https://blog.heroku.com/introducing_the_app_json_application_manifest

FluentMigrator Migration From Application_Start

I am currently changing our database deployment strategy to use FluentMigration and have been reading up on how to run this. Some people have suggested that it can be run from Application_Start, I like this idea but other people are saying no but without specifying reasons so my questions are:
Is it a bad idea to run the database migration on application start and if so why?
We are planning on moving our sites to deploying to azure cloud services and if we don't run the migration from application_start how should/when should we run it considering we want to make the deployment as simple as possible.
Where ever it is run how do we ensure it is running only once as we will have a website and multiple worker roles as well (although we could just ensure the migration code is only called from the website but in the future we may increase to 2 or more instances, will that mean that it could run more than once?)
I would appreciate any insight on how others handle the migration of the database during deployment, particularly from the perspective of deployments to azure cloud services.
EDIT:
Looking at the comment below I can see the potential problems of running during application_start, perhaps the issue is I am trying to solve a problem with the wrong tool, if FluentMigrator isn't the way to go and it may not be in our case as we have a large number of stored procedures, views, etc. so as part of the migration I was going to have to use SQL scripts to keep them at the right version and migrating down I don't think would be possible.
What I liked about the idea of running during Application_Start was that I could build a single deployment package for Azure and just upload it to staging and the database migration would be run and that would be it, rather thank running manual scripts, and then just swap into production.
Running migrations during Application_Start can be a viable approach. Especially during development.
However there are some potential problems:
Application_Start will take longer and FluentMigrator will be run every time the App Pool is recycled. Depending on your IIS configuration this could be several times a day.
if you do this in production, users might be affected i.e. trying to access a table while it is being changed will result in an error.
DBA's don't usually approve.
What happens if the migrations fail on startup? Is your site down then?
My opinion ->
For a site with a decent amount of traffic I would prefer to have a build script and more control over when I change the database schema. For a hobby (or small non-critical project) this approach would be fine.
An alternative approach that I've used in the past is to make your migrations non-breaking - that is you write your migrations in such a way they can be deployed before any code changes and work with the existing code. This way both code and migrations both can be deployed independently 95% of the time. For example instead of changing an existing stored procedure you create a new one or if you want to rename a table column you add a new one.
The benefits of this are:
Your database changes can be applied before any code changes. You're then free to roll back any breaking code changes or breaking migrations.
Breaking migrations won't take the existing site down.
DBAs can run the migrations independently.

Automated migrations for Azure Blob Storage. Does this exist?

We are using Azure Blob Storage in all our projects. Through lifetime of a project the naming convention for files in Azure change: sometimes we would like to rename containers, remove extra folders and other clean-up operations.
But Azure does not allow easily to rename things, we have to do copy-delete.
Also we can change naming convention locally, during development. But we need to remember do the exact operation on production storage when we deploy new versions.
At the same time we use Entity Framework migrations: we updated database, migration script is created. Then we run "update-database" and DB is updated. The same is run automatically by deployment scripts: check if production DB needs to be updated, and update it if needed.
What would be good if we can do the same migration goodness for Azure storage: check if all the migration scripts have been applied, execute processes for missing scripts. Somewhere in the containers keep a reference to a latest executed script.
Does such thing exist? or should I have a go on it and try implementing something myself.
No, such functionality/behavior does not exists. And do remember that EF migrations are supported and are part of the EF itself, not the Data Base! So when you talk about Azure Blob Storage - it, as a service does not provide such functionality, the same way SQL Server itself does not do it.
To the question if such a library/code exists - no there isn't.
You are raising a very interesting question though!
I personally am not a big fan of "migrations". You can do it while in early stages of development life cycle. But once you hit GA/Production, you have to be very careful what you are doing. Even EF migrations might be good with small database sizes, but are you willing to run migrations on a DB which has tables with millions of records production data? Same with blobs. If you have 100 or 1000 blobs might be fine. How about 2M blobs? Are you really willing to put some code that would go through 2M entities and do some operations over it, and run this code as part of your build/deploy process? I would not.

Resources