Automated migrations for Azure Blob Storage. Does this exist? - azure

We are using Azure Blob Storage in all our projects. Through lifetime of a project the naming convention for files in Azure change: sometimes we would like to rename containers, remove extra folders and other clean-up operations.
But Azure does not allow easily to rename things, we have to do copy-delete.
Also we can change naming convention locally, during development. But we need to remember do the exact operation on production storage when we deploy new versions.
At the same time we use Entity Framework migrations: we updated database, migration script is created. Then we run "update-database" and DB is updated. The same is run automatically by deployment scripts: check if production DB needs to be updated, and update it if needed.
What would be good if we can do the same migration goodness for Azure storage: check if all the migration scripts have been applied, execute processes for missing scripts. Somewhere in the containers keep a reference to a latest executed script.
Does such thing exist? or should I have a go on it and try implementing something myself.

No, such functionality/behavior does not exists. And do remember that EF migrations are supported and are part of the EF itself, not the Data Base! So when you talk about Azure Blob Storage - it, as a service does not provide such functionality, the same way SQL Server itself does not do it.
To the question if such a library/code exists - no there isn't.
You are raising a very interesting question though!
I personally am not a big fan of "migrations". You can do it while in early stages of development life cycle. But once you hit GA/Production, you have to be very careful what you are doing. Even EF migrations might be good with small database sizes, but are you willing to run migrations on a DB which has tables with millions of records production data? Same with blobs. If you have 100 or 1000 blobs might be fine. How about 2M blobs? Are you really willing to put some code that would go through 2M entities and do some operations over it, and run this code as part of your build/deploy process? I would not.

Related

How to reload tensorflow model in Google Cloud Run server?

I have a webserver hosted on cloud run that loads a tensorflow model from cloud file store on start. To know which model to load, it looks up the latest reference in a psql db.
Occasionally a retrain script runs using google cloud functions. This stores a new model in cloud file store and a new reference in the psql db.
Currently, in order to use this new model I would need to redeploy the cloud run instance so it grabs the new model on start. How can I automate using the newest model instead? Of course something elegant, robust, and scalable is ideal, but if something hacky/clunky but functional is much easier that would be preferred. This is a throw-away prototype but it needs to be available and usable.
I have considered a few options but I'm not sure how possible either of them are:
Create some sort of postgres trigger/notification that the cloud run server listens to. Guess this would require another thread. This ups complexity and I'm unsure how multiple threads works with Cloud Run.
Similar, but use a http pub/sub. Make an endpoint on the server to re-lookup and get the latest model. Publish on retrainer finish.
could deploy a new instance and remove the old one after the retrainer runs. Simple in some regards, but seems riskier and it might be hard to accomplish programmatically.
Your current pattern should implement cache management (because you cache a model). How can you invalidate the cache?
Restart the instance? Cloud Run doesn't allow you to control the instances. The easiest way is to redeploy a new revision to force the current instance to stop and new ones to start.
Setting a TTL? It's an option: load a model for XX hours, and then reload it from the source. Problem: you could have glitches (instances with new models and instances with the old one, up to the cache TTL expires for all the instances)
Offering cache invalidation mechanism? As said before, it's hard because Cloud Run doesn't allow you to communicate with all the instances directly. So, push mechanism is very hard and tricky to implement (not impossible, but I don't recommend you to waste time with that). Pull mechanism is an option: check a "latest updated date" somewhere (a record in Firestore, a file in Cloud Storage, an entry in CLoud SQL,...) and compare it with your model updated date. If similar, great. If not, reload the latest model
You have several solutions, all depend on your wish.
But you have another solution, my preference. In fact, every time that you have a new model, recreate a new container with the new model already loaded in it (with Cloud Build) and deploy that new container on Cloud Run.
That solution solves your cache management issue, and you will have a better cold start latency for all your new instances. (In addition of easier roll back, A/B testing or canary release capability, version management and control, portability, local/other env testing,...)

Using the temp directory for Azure Functions

I have a set of Azure functions running on the same host, which scales up to many instances at times. I'd like to store a very small amount of ephemeral data (a few kb's) and opportunistically share those data between function executions. I know that the temp directory is only available to the functions running on that same instance. I also know that I could use the home directory, durable functions, or other Azure (such as blob) storage to share data between all functions persistently.
I have two main questions
What are the security implications of using the temp directory? Who can access its contents outside of the running function?
Is this still a reasonable solution? I can't find much in the way of Microsoft documentation outside of what looks like some outdated kudu documentation here.
Thanks!
Answer to Question 1
Yes, it is secure. The Function host process runs inside a sandbox. All access data stored to D:\local is self-contained and isolated to the processes within the sandbox. Kindly see https://github.com/projectkudu/kudu/wiki/Azure-Web-App-sandbox
Answer to Question 2
The data in D:\local\Temp exists as long as the Function host process is alive. The Functions host process can be recycled at any time due to unexpected events such as unhandled exceptions, timeouts, hitting resource usage limits for your plan. As long as your workflow accounts for the fact that the data stored in D:\local\Temp is ephemeral, then the answer is a 'yes'.
I believe this will answer your question :
Please refer to this for more details.
Also, when Folder/Files when created via code inside the “Temp” folder; you cannot view them when you visit KUDU site. But you can use those files/ folders.
How to view the files/ folders if created via KUDU?
We will need to add - WEBSITE_DISABLE_SCM_SEPARATION = true in Configuration(app settings).
Note:- Another important note is that the Main site and the scm site do not share temp files. So if you write some files there from your site, you will not see them from Kudu Console (and vice versa).
You can make them use the same temp space if you disable separation (via WEBSITE_DISABLE_SCM_SEPARATION).
But note that this is a legacy flag, and its use is not recommended/supported.
(ref : shared document link)
Security implications depend on the level of isolation you are seeking.
In shared app-service plan or consumption plan you need to trust the sandbox isolation. This is not an isolated microvm like AWS lambda.
If you have your own app-service plan then you need to trust the VM hypervisor isolation of your app-service plan.
If you are really paranoid or running healtcare application, then you likely need to run your function in a ASE plan.
Reasonable solution is one where the cost is not exceeding the worth of data you are protecting :)

FluentMigrator Migration From Application_Start

I am currently changing our database deployment strategy to use FluentMigration and have been reading up on how to run this. Some people have suggested that it can be run from Application_Start, I like this idea but other people are saying no but without specifying reasons so my questions are:
Is it a bad idea to run the database migration on application start and if so why?
We are planning on moving our sites to deploying to azure cloud services and if we don't run the migration from application_start how should/when should we run it considering we want to make the deployment as simple as possible.
Where ever it is run how do we ensure it is running only once as we will have a website and multiple worker roles as well (although we could just ensure the migration code is only called from the website but in the future we may increase to 2 or more instances, will that mean that it could run more than once?)
I would appreciate any insight on how others handle the migration of the database during deployment, particularly from the perspective of deployments to azure cloud services.
EDIT:
Looking at the comment below I can see the potential problems of running during application_start, perhaps the issue is I am trying to solve a problem with the wrong tool, if FluentMigrator isn't the way to go and it may not be in our case as we have a large number of stored procedures, views, etc. so as part of the migration I was going to have to use SQL scripts to keep them at the right version and migrating down I don't think would be possible.
What I liked about the idea of running during Application_Start was that I could build a single deployment package for Azure and just upload it to staging and the database migration would be run and that would be it, rather thank running manual scripts, and then just swap into production.
Running migrations during Application_Start can be a viable approach. Especially during development.
However there are some potential problems:
Application_Start will take longer and FluentMigrator will be run every time the App Pool is recycled. Depending on your IIS configuration this could be several times a day.
if you do this in production, users might be affected i.e. trying to access a table while it is being changed will result in an error.
DBA's don't usually approve.
What happens if the migrations fail on startup? Is your site down then?
My opinion ->
For a site with a decent amount of traffic I would prefer to have a build script and more control over when I change the database schema. For a hobby (or small non-critical project) this approach would be fine.
An alternative approach that I've used in the past is to make your migrations non-breaking - that is you write your migrations in such a way they can be deployed before any code changes and work with the existing code. This way both code and migrations both can be deployed independently 95% of the time. For example instead of changing an existing stored procedure you create a new one or if you want to rename a table column you add a new one.
The benefits of this are:
Your database changes can be applied before any code changes. You're then free to roll back any breaking code changes or breaking migrations.
Breaking migrations won't take the existing site down.
DBAs can run the migrations independently.

Quicker backup in windows azure

We are using windows azure for a software and when we release a new version of the system we usually takes the site down and we take a backup with the code below. (CREATE DATABASE databaseCopy AS COPY OF Database;)
The backup is taken to ensure that nothing goes wrong and that we can rollback to the latest version if we have created a bigger bug. However this takes a lot of time. (Hours) Is there a way to do a copy faster? We dont have any active users at the time for the release so maybe you can do it faster in another way? If not how do you usually do you usually do your database-upgrades?
Alternatively you can use SQL Database Import Export Service - http://msdn.microsoft.com/en-us/library/windowsazure/jj650016.aspx
NOTE: There are many other ways as mentioned here - http://blogs.msdn.com/b/wats/archive/2013/03/04/different-ways-to-backup-your-windows-azure-sql-database.aspx

Sharepoint disaster recovery

What are your disaster recovery plans for Windows Sharepoint Services 3.0 ?
Currently we are backuping all databases (1 content, admin, search and config) using sql backup tools, and backuping the front end server via dataprotector.
To test our backups, we use another server farm, restore the content database (following the procedure on technet) and create a new application that uses this database. We just have to redeploy solutions on the newly created sharepoint application.
However, we have to change database access credentials (on sql server) : the user accounts used on production aren't the same as those used on our "test" farm.
At the end, we can restore our content database and access all our sites. Searching doesn't work, but we're investigating.
Is this restore scenario reliable (as in supported by microsoft) ?
You can't really backup / restore both config database and search database:
restoring config database only work if your new farm have exactly the same server names
when you restore the search database, the fulltext index is not synchronize. however, this is not a problem as you can just reindex.
As a result, I would say that yes, this a reliable for content. But take care of:
You may have to redo some configuration (AAM, managed path...).
This does not include customization, you want to keep a backup of your solution
Reliability is in the eye of the beholder. In this case, if your tests of the restore process is successful, then yes, it is reliable.
A number of my clients run SharePoint (both MOSS and WSS) in virtual environments, SQL Server is also virtualised and backed up both with SQL tools and with Volume Shadow copy.
The advantage of a Virtual Environment is downtime is only as long as it takes your Virtual Server host to boot the images.
If you are not using Virtualisation, then remember to backup transaction logs regularly as this will make it easier to restore to a given point in the day - it also means that your transaction logs dont grow too big!
I prefer to use the stsadm -o backup command 'for catastrophic backup' as it says in the help. This can be scheduled but requires some maintenance of the backup metadata XML file when you start running out of disk space and need to archive older backups. It has the advantage of transferring over timer jobs (usually) and other configuration because as Nico says, restoring the config database won't work for most situations.
To restore, you can use the user interface which is nice and not have to mess around with much else. I think it restores your solutions as well but haven't tested that extensively.

Resources