Is there any way to build and deploy Azure Batch Application Packages when changes are pushed to Bitbacket repository?
I'm looking for the same Deployment approach as for Azure Functions or something like this.
To start with this is what I think on top of my head.
Cool, I will share some information and thoughts around this, I am sure you can make use of the information to help your idea.
There are 2 levels of application package:
Pool level; and
Task level
Detail information here: https://learn.microsoft.com/en-us/azure/batch/batch-application-packages
The pool level is set at the pool level and available to any task joining the pool where as the task level gets unpacked at the creation of task.
Please be aware of the max limits of pkgs etc:
* https://learn.microsoft.com/en-us/azure/batch/batch-quota-limit#other-limits
Key
AFAIK, there is no flag which can tell the vm that the current pkg is updated, hence in your scenario 2 things can happen:
pool level Scenario: if you are joining the pool every time: If you can afford to create pool i.e. at the join pool level then you can keep the package name and every time the code is updated you can recreate the pool which will end up creating the whole thing again i.e. the new package will get picked up.
Task level: If you don't want to create pool all the time, then you can do it by creating new task every time your code gets changed, not the caveat to that will be the max limit which is described at the link above.
Both ways you can do it via user code but deciding between what scenario will suite you depends on the grand architecture of the case.
Information flow possibility at user end
some resource in bit bucket.
User at any change in that resource ==> packs it in *.zip format and then carry on with the batch side of things.
User create the pool or mention task level packages (depending on the detail above); can also add the versions for the same package (beaware of the max limits here)
pkg is available in vm.
Alternate approach:
There is another way this can be done which is non package way:
Mount drive to the node at start task:
And user code has to make user that drive always gets updated will latest version of “*.files”
I hope this help your scenario \ design :), thanks!
Related
I have a continuous Webjob running on my Azure Website. It is responsible for doing some work after retrieving items from a QueueTrigger. I am attempting to increase the rate in which the items are processed off the Queue. As I scale out my App Service Plan, the processing rate increases as expected.
My concern is that it seems wasteful to pay for additional VMs just to run additional instances of my Webjob. I am looking for options/best practices to run multiple instances of the same Webjob on a single server.
I've tried starting multiple JobHosts in individual threads within Main(), but either that doesn't work or I was doing something wrong... the Webjob would fail to run due to what looks like each thread trying to access 'WebJobSdk.marker'. My current solution is to publish my Webjob multiple times, each time modifying 'webJobName' slightly in 'webjob-publish-settings.json' so that the same project is considered a different Webjob at publish time. This works great so far, expect that it creates a lot of additional work each time I need to make any update.
Ultimately, I'm looking for some advice on what the recommended way of accomplishing this would be. Ideally, I would like to get the multiple instances running via code, and only have to publish once when I need to update the code.
Any thoughts out there?
You can use the JobHostConfiguration.QueuesConfiguration.BatchSize and NewBatchThreshold settings to control the concurrency level of your queue processing. The latter NewBatchThreshold setting is new in the current in progress beta1 release. However, by enabling "prerelease" packages in your Nuget package manager, you'll see the new release if you'd like to try it. Raising the NewBatchThreshold setting increases the concurrency level - e.g. setting it to 100 means that once the number of currently running queue functions drops below 100, a new batch of messages will be fetched for concurrent processing.
The marker file bug was fixed in this commit a while back, and again is part of the current in progress v1.1.0 release.
What is the best way to do a one-time job on Azure?
Say we want to extend a table in the associated database with a double column. All the new entries will have this value computed by the worker(s) at insertion, but somebody has to take care of the entries that are already in the table. I thought of two alternatives:
a method called by the worker only if a database entry (say, "JobRun") is set to true, and the method would flip the entry to false.
a separate app that does the job, and which is downloaded and run manually using remote desktop (I cannot connect the local app to the Azure SQL server).
The first alternative is messy (how should I deal with the code at the next deployment? delete it? comment it? leave it there? also, what if I will have another job in the future? create a new database entry "Job2Run"?). The second one looks like a cheap hack. I am sure that there is a better way I could not think of.
If you want to run a job once you'll need to take into account the following:
Concurrency: While the job is running, make sure no other worker picks up the job and runs it at the same time (you can use leases for this. More info here).
Once the job is done, you'll need to keep track (in Table Storage, SQL Azure, ...) that the job completed successfully. The next time a worker tries to pick up the job, it will look in Table Storage / SQL Azure / ..., it will see that the job completed and skip the job.
Failure: Maybe your worker crashes during the job which should allow another worker to pick up the job without any issue.
In your specific use case I would also consider using a tool like dbup to manage updates to your schema and existing data with SQL Scripts. Tools like these keep track of which scripts have been executed by adding them in a table in the database.
The scenario is simple: using EF code first migrations, with multiple azure website instances, decent size DB like 100GB (assuming azure SQL), lots of active concurrent users..say 20k for the heck of it.
Goal: push out update, with active users, keep integrity while upgrading.
I've sifted through all the docs I can find. However the core details seem to be missing or I'm blatantly overlooking them. When Azure receives an update request via FTP/git/tfs, how does it process the update? What does it do with active users? For example, does it freeze incoming requests to all instances, let items already processing finish, upgrade/replace each instance, let EF migrations process, then let traffics start again? If it upgrades/refreshes all instances simultaneously, how does it ensure EF migrations run only once? If it refreshes instances live in a rolling upgrade process (upgrade 1 at a time with no inbound traffic freeze), how could it ensure integrity since instances in the older state would/could potentially break?
The main question, what is the real process after it receives the request to update? What are the recommendations for updating a live website?
To put it simply, it doesn't.
EF Migrations and Azure deployment are two very different beasts. Azure deployment gives you a number of options including update and staging slots, you've probably seen
Deploy a web app in Azure App Service, for other readers this is a good start point.
In General the Azure deployment model is concerned about the active connections to the IIS/Web Site stack, in general update ensures uninterrupted user access by taking the instance being deployed out of the load balancer pool and redirecting traffic to the other instances. It then cycles through the instances updating one by one.
This means that at any point in time, during an update deployment there will be multiple versions of your code running at the same time.
If your EF Model has not changed between code versions, then Azure deployment works like a charm, users won't even know that it is happening. But if you need to apply a migration as part of the migration BEWARE
In General, EF will only load the model if the code and DB versions match. It is very hard to use EF Migrations and support multiple code versions of the model at the same time
EF Migrations are largely controlled by the Database Initializer.
See Upgrade the database using migrations for details.
As a developer you get to choose how and when the database will be upgraded, but know that if you are using Mirgrations and deployment updates:
New code code will not easily run against the old data schema.
If the old code/app restarts many default initialization strategies will attempt roll the schema back, if this happens refer to point 1. ;)
If you get around the EF model loading up against the wrong version of the schema, you will experience exceptions and general failures when the code tries to use schema elements that are not there
The simplest way to manage a EF migration on a live site is to take all instances of the site down for deployments that include an EF Migration
- You can use a maintenance page or a redirect, that's up to you.
If you are going to this trouble, it is probably best to manually apply the DB update, then if it fails you can easily abort the deployment, because it hasn't started yet!
Otherwise, deploy the update and the first instance to spin up will run the migration, if the initializer has been configured to do so...
If you absolutely must have continuous deployment of both site code/content and model updates then EF migrations might not be the best tool to get started with as you will find it very restrictive OOTB for this scenario.
I was watching a "Fundamentals" course on Pluralsight and this was touched upon.
If you have 3 sites, Azure will take one offline and upgrade that, and then when ready restart it. At that point, the other 2 instances get taken off-line and your upgraded insance will start, thus running your schema changes.
When those 2 come back the EF migrations would already have been run, thus your sites are back.
In theory then it all sounds like it should work, although depending upon how much EF migrations need running, requests may be delayed.
However, the comment from the author was that in this scenario (i.e. making schema changes) you should consider if your website can run in this situation. The suggestion being that you either need to make your code work with both old and new schemas, or show a "maintenance system down page".
The summary seems to be that depending on what you are actually upgrading, this will impact and affect your choices and method of deployment.
Generally speaking if you want to support active upgrades you need to support multiple version of you application simultaneously. This is really the only way to reliably stay active while you migrate/upgrade. Also consider feature switches to scale up your conversion in a controlled manner.
I am trying to port an application to azure platform. I want to run an existing application multiple times. My initial idea is as follows: I have a master_process. I have many slave_processes. Each process is a worker role in Azure. Each slave_process will run an instance of the application independently. I want master_process to start many slave_processes and provide them the input arguments. At the end, master_process will collect the results. Currently, I have a working setup for calling the whole application from a C# wrapper. So, for the success, I need two things: First, I have to find a way to start slave workers inside of a master worker (just like threads). Second, I need to find a way to store results of the slave workers and reach these result files from master worker. Can anyone help me?
I think I would try and solve the problem differently. Deploying a whole new instance can take 15 to 30 minutes. Adding extra instances to an already running worker role is a little quicker, but not by much. I'm going to presume that you want results faster than that and that this process is something that is run frequently.
I would have just one worker role type that runs your existing logic and as many instances of that worker role that you determine you'll need. Whatever your client is will decide that it needs to break the work up in a certain number of pieces, let's say 10 for the sake of argument. It will give each piece of work an ID (e.g. a guid) and then put 10 messages that contain the parameters and the ID into a queue. Your worker role instances take messages out of the queue, do their work and write their results to storage somewhere (either SQL Azure, Azure Table Storage or maybe even blob storage depending on what the results are). The client polls that storage to wait for all of the results to be complete and then carries on.
If this is a process that is only run infrequently, then rather than having the worker roles deployed all of the time, you could use the same method I've described, but in addition get the client code to deploy the worker roles when it starts and then delete them when it's finished through the management API. There are samples on MSDN on how to use this.
I have a similar situation you might find useful:
I have a large sequential batch process I run in Azure which requires pre and post processing. The technique I used was to use instances of a single multifunctional worker role, but to use a "quorum" to nominate a head node, which then controls the workflow.
They way I do it is using an azure page blob as the quorum (basically a kind of global mutex/lock), because once a node grabs it for writing it's locked. For resilience, in case there's an issue with the head node, all nodes occasionally try to recapture the quorum.
I want to build an Azure application that has two worker roles and NO web roles. When the worker roles first start up I want ONLY ONE of the roles to do the following a single time:
Download and parse a master file then enqueue multiple "child" tasks based on the
contents of the master file
Enqueue a single master file download "child" task to run the next day
Each of the "child" tasks would then be done by both of the workers until the task queue was exhausted. Think of the whole things as "priming the pump"
This sort of thing is really easy if I add the the first "master" task manually in a queue by calling a web role but seems to be really hard to do in an auto-start mode.
Any help in this regard would be greatly appreciated!
Thanks.....
One possibility: instead of calling a web role, just load the queue directly. (It sounds like this is the sort of application you'll want to automatically spin up to do some work and then shut down again... if you're automating that, it should be trivial to also automate loading the queue.)
A (perhaps) better option: Use some sort of locking mechanism to make sure only one worker instance does the initialization work. One way to do this is to try to create the queue (or a blob, or an entity in a table). If it already exists, then the other instance is handling initialization. If the create succeeds, then it's this instance's job.
Note that it's always better to use a lease than a lock, in case the instance that's doing the initialization fails. Consider using a timeout (e.g. storing a timestamp in table storage or in the metadata of the blob or in the name of the queue...).
We did end-up with the exact same sort of problem, that's why we introduced a O/C mapper (object to cloud). Basically, you want to introduce two types of cloud services:
QueueService that consumes messages whenever available.
ScheduledService that triggers operations on a scheduled basis.
Then, as others suggested, in the cloud, you really prefer using leases instead of locks, in order to avoid your cloud app to end up freezed forever due to a temporary hardware (or infrastructure) issue.