Azure WebJobs for Aggregation - azure

I'm trying to figure out a solution for recurring data aggregation of several thousand remote XML and JSON data files, by using Azure queues and WebJobs to fetch the data.
Basically, an input endpoint URL of some sort would be called (with a data URL as parameter) on an Azure website/app. It should trigger a WebJobs background job (or can it continuously running and checking the queue periodically for new work), fetch the data URL and then callback an external endpoint URL on completion.
Now the main concern is the volume and its performance/scaling/pricing overhead. There will be around 10,000 URLs to be fetched every 10-60 minutes (most URLs will be fetched once every 60 minutes). With regards to this scenario of recurring high-volume background jobs, I have a couple of questions:
Is Azure WebJobs (or Workers?) the right option for background processing at this volume, and be able to scale accordingly?
For this sort of volume, which Azure website tier will be most suitable (comparison at http://azure.microsoft.com/en-us/pricing/details/app-service/)? Or would only a Cloud or VM(s) work at this scale?
Any suggestions or tips are appreciated.

Yes, Azure WebJobs is an ideal solution to this. Azure WebJobs will scale with your Web App (formerly Websites). So, if you increase your web app instances, you will also increase your web job instances. There are ways to prevent this but that's the default behavior. You could also setup autoscale to automatically scale your web app based on CPU or other performance rules you specify.
It is also possible to scale your web job independently of your web front end (WFE) by deploying the web job to a web app separate from the web app where your WFE is deployed. This has the benefit of not taking up machine resources (CPU, RAM) that your WFE is using while giving you flexibility to scale your web job instances to the appropriate level. Not saying this is what you should do. You will have to do some load testing to determine if this strategy is right (or necessary) for your situation.
You should consider at least the Basic tier for your web app. That would allow you to scale out to 3 instances if you needed to and also removes the CPU and Network I/O limits that the Free and Shared plans have.
As for the queue, I would definitely suggest using the WebJobs SDK and let the JobHost (from the SDK) invoke your web job function for you instead of polling the queue. This is a really slick solution and frees you from having to write the infrastructure code to retrieve messages from the queue, manage message visibility, delete the message, etc. For a working example of this and a quick start on building your web job like this, take a look at the sample code the Azure WebJobs SDK Queues template punches out for you.

Related

Many triggers in single webjob

Does having many triggers (blob, servicebus,timer) in a single webjob will reduce the performance of webjob?
Is there any way to improve performance of webjob with many triggers?
Can a heavy weight webjob be divided into smaller weight webjob?
Regard to Azure WebJobs as a feature of Azure App Service to run background job as the offical document said below.
WebJobs is a feature of Azure App Service that enables you to run a program or script in the same context as a web app, API app, or mobile app. There is no additional cost to use WebJobs.
Althought it said no additional cost, WebJob as a simple and useful feature was founded before Azure published other similar and more powerful services, which like Functions be introduced in the same doc as below.
Azure Functions provides another way to run programs and scripts. For a comparison between WebJobs and Functions, see Choose between Flow, Logic Apps, Functions, and WebJobs.
In the reference document above, the Summary section recommend its best application scenario.
Summary
Azure Functions offers greater developer productivity, more programming language options, more development environment options, more Azure service integration options, and more pricing options. For most scenarios, it's the best choice.
Here are two scenarios for which WebJobs may be the best choice:
You need more control over the code that listens for events, the JobHost object. Functions offers a limited number of ways to customize JobHost behavior in the host.json file. Sometimes you need to do things that can't be specified by a string in a JSON file. For example, only the WebJobs SDK lets you configure a custom retry policy for Azure Storage.
You have an App Service app for which you want to run code snippets, and you want to manage them together in the same DevOps environment.
For other scenarios where you want to run code snippets for integrating Azure or third-party services, choose Azure Functions over WebJobs with the WebJobs SDK.
Meanwhile, per my experience on Azure, WebJobs and Functions are only suitable for some simple and light-weight task job. For high performance requirement, Azure Batch service is a good choice to get the balance between cost and ease of use.
1. Does having many triggers (blob, servicebus,timer) in a single webjob will reduce the performance of webjob?
Provided that your webjob is not singleton. You can have multiple functions in your webjob with multiple triggers and the performance will not reduce. (provided your webapp plan is beefy enough to handle all the load.)
2.Is there any way to improve performance of webjob with many triggers? -
the best way would be to divide the webjob into smaller webjobs and each webjob having a single trigger. And scale your webjobs out (add more instances) based on load. Also in case your webjob executes within 5 mins you can also choose to use Azure Function App. Which gives a much better option. Alternatively, assuming the webjob execution takes more than 5 mins you can have your webjob exe as a docker image and provision them using logic app on demand using ACI. In this scenario you will be configuring the triggers in logic app.
1. Can a heavy weight webjob be divided into smaller weight webjob? - yes, see my previous answers.

Azure SFTP Logic App

I have an Azure Logic App that monitors an SFTP site for new files, and if it finds one, it sends a message to an Azure Queue for subsequent processing, then deletes the file. My application has grown in scale and a single logic app seems to only be grabbing 5-10 files a minute.
Is it possible to setup a second (third, fourth, etc.) Logic App that monitors the same SFTP site, without the two apps conflicting/colliding with each other. I also see that there is a "High Throughput" setting that seems interesting, but I'm not sure it is what I need. My ultimate goal is to process more files faster, and I am considering changing the Logic App out for a scheduled Web Job that monitors the SFTP site. Since I am live and files are pouring in, I am a little reluctant to change anything until I know it's safe.
Any insight would be appreciated.
Thanks!!
Logic app comes under server less architecture, IF we select the pricing model based on 'number of executions' then it impact on the performance since Microsoft allocates the resources for such kind of pricing model as shared one and which server is free up the processing. I would recommend to attache service plan to it and select the pricing model 'per minute'
One more point, If you want longer operations to be done then Azure logic app is not appropriate one but since you are connecting to enterprise integration then logic app is good choice. I would recommend to divide this functionality between logic app with Azure function OR Microsoft flow.

Alternate to run window service in Azure cloud

We currently have a window service which send some notification emails to users after doing some processing on database(SQL database). Runs once in day.
We want to move this on azure cloud. One alternate is to put it on Azure VM as is. but I am finding some other best possible solution for that.
I study about recurring and on demand Web jobs but I am not sure is this is best solution.
Also is there any possibility to update configuration of service code in App.config without re-deploy the code of service on cloud. I means we can manage configuration from Azure portal.
Thanks in advance.
Update 11/4/2016
Since this was written, there are 2 additional features available in Azure that are both excellent choices depending on what functionality you need:
Azure Functions (which was based on the WebJobs described below): Serverless code that can be trigger/invoked in various ways, and has scaling support.
Azure Service Fabric: Microservice platform, with support for actor model, stateful and stateless services.
You've got 3 basic options:
Windows service running on VM
WebJob
Cloud service
There's a lot of information out there on the tradeoffs between these choices, but here's a brief summary.
VM - Advantages: you can move your service basically as it is without having to change much or any of your code. They also have the easiest connectivity with other resources in Azure (blob storage, virtual networks, etc). The disadvantage is you're giving up all the of PaaS advantages and are still stuck managing your own VM infrastructure
WebJob - Advantages: Multiple invocation options (queues, blobs, manually, queue receive loops, continuous while-loop style, etc), scheduled (would cover your case). Easy to deploy (can go with website, as a console app, automatically through Kudu), has some built in logging in Azure portal - and yes, to answer your question, you can alter the configuration in the portal itself for connection strings and app settings.
Disadvantages - you'll need to update code, you don't have access to underlying resources (if you need that), and more of something to keep in mind than a disadvantage - it uses the same resources as the webapp it's deployed with.
Web Jobs are the newest of the options, but at the same time appear to have active development going on to increase the functionality and usefulness.
Cloud Service - like a managed VM, has some deployment options, access to underlying VM if needed. Would require some code changes from your existing service.
There's nothing you've mentioned in your use case that makes me think a Web Job shouldn't be first thing you try.
(Edit: Troy Hunt has a great and relatively recent blog post illustrating most of the points I've mentioned about Web Jobs above: http://www.troyhunt.com/2015/01/azure-webjobs-are-awesome-and-you.html)

How to build a auto-scale Azure Cloud Service based on network usage?

Azure Cloud Services have auto-scale based on CPU / Queue. We have a set of machines running API for uploading and processing files. Although we moved the processing part on Worker Role that scale depending on the queue size, the servers but also take care of the upload while responding to other operations like downloading.
Right now we're using more machines for the just in case scenario, but we want to build a way to scale and to be cost-efficient while having a great upload experience for our users.
What would your approach be for creating a way to detect the network usage across all machines from the same Cloud Service and auto-scale if necessary?
I would:
1) Create metrics that calculate the amount of time it takes to download/upload a file.
2) Aggregate the metrics in some persistence layer (we have plenty in Azure)
3) Create a service that looks those metrics
4) Check the thresholds
5) Use the Management Libraries for .NET to trigger scaling on the Cloud Service(s) affected.
This approach also scales with your solution. You can eventually separate the scaling part from the checking part and have them as two different services, communicate asynchronously.
We also have an old, open source now project that does some of that for you, so you don't have to reinvent the wheel. It's called WASABi. Be careful though as this is not maintained anymore but as I said, you can use it as inspiration.

Difference between Azure Web Jobs and Azure Scheduler in Microsoft Azure?

Can anybody explain the difference between Azure Web Jobs and Azure Scheduler
Azure Web Jobs
Only available on Azure Websites
It is used to run code at particular intervals. E.g. a console application every day
Used to trigger and run workloads.
Mainly recommended for workloads that either scale with the website or are relatively small.
Can be persistently running if "Always On" selected, otherwise you will get the 20 min timeout.
The code that needs to be run and schedule are defined together.
Azure Scheduler
Is not tied to Websites or Cloud Services
It allows you to call a website or add a message to a storage queue
Used for triggering events or triggering small workloads (e.g. add to queue), usually to trigger larger workloads
Mainly recommended for triggering more complex workloads.
This is only a trigger, and a separate function listening to trigger events (e.g. queue's) needs to be coded separately.
For many instances I prefer to use the scheduler to push to a storage queue and a worker role on each instance takes off the queue. This keeps tasks controlled granularly and can also move up or down in scale outside of your website.
With WebJobs they scale up and down with your site and hence your background tasks can become over taxed if your website is experiencing low traffic and scaled down.
Azure Scheduler - Provides a way to easily schedule http calls in a well-defined schedule, like every hour, every Friday at 9:00 am, Once a day, ...
Azure WebJobs - Provides a way to run small to medium work load (in the form of a script: .exe, .cmd, .sh, .js, ...) at the same context of an Azure Website (but can be hosted even with an empty website).
While a WebJob can run continuously (with a process that has a while loop) and Azure will make sure this WebJob is always running (with "Always On" set).
There is also an integration between Azure scheduler and Azure WebJobs where you have a WebJob that is running some finite work and the schduler is responsible for scheduling this work (invoking the WebJob).
So in summary, the scheduler is about scheduling work and WebJobs is about running work load.

Resources