Azure: Orchestration file transfers - Which Azure components are best suited? - azure

I have a project in which we need to transfer files (mostly SFTP-based, but also HTTP), between 20+ systems. We have currently identified +200 different files that needs to be transferred. We would like a setup in Azure where the different transfers can be setup, monitored and logged, however, we are unsure which way to go.
The question is: Which Azure components would be best suited for the above task? Which components would you use?
One possible solution would be to implement at large set of Azure functions, each responsible for one file transfer. This would require us to setup the monitoring ourselves, and it will result in a very large number of functions.
We have also been looking towards Azure data factory and Azure Logic apps, but we are unsure if they would provide any benefits with regards to monitoring, re-running failed jobs etc.

As you already mentioned in your description,obviously, Azure Function is not suitable for your scenario because you have to build a large number of functions to do the transfer work.Moreover,it's painful to monitoring such scale of function executions.You need to distinguish the log data and persist them into table storage or something like that which causes more cost. So,it's passed!
In my opinion, ADF is the best solution for you.It could be monitored by many ways and it supports re-run feature,please follow this video.Also,another distinct feature for ADF is Self-Hosted Integration Runtime which supports transmission between on-premise system and Azure cloud environment.
As for Logic App,i'm can't find any re-run feature related to it so i don't think it could attract your attention.

Related

Is it possible to Monitor Azure Integration Runtime?

I am running few Data Pipelines in Azure Data Factory and its using Azure Integration Runtime for the compute.
I am trying to Monitor the CPU/Memory Usage Pipelines Consume and Utilise Azure IR.
I have checked in the Azure Monitor but the CPU / Memory Metrics are for Self Hosted Integration Runtime I think.
Also, with the Diagnostic Setting Enabled, I tried to verify the details in the Logs too but these details are not available.
Can anyone help to know more options?
If you are referring to the Azure AutoResolveIntegrationRuntime, then no there is not, and this is why (from https://www.cathrinewilhelmsen.net/integration-runtimes-azure-data-factory/)
Microsoft has massive elastic pools across the various locations/regions they offer Azure, and at runtime ADF determines what pool/hardware it will use to perform the Pipeline activities. So there is really no way (and no need) to monitor the Azure Autoresolve IR. But if you are interested in monitoring Self-Hosted IR's then there are many ways to do it.
One simple and straight forward way to do it is by creating Azure Dashboards in the Metrics portion of Azure Monitor. As you can see from the screenshot below it provides good visual representation of usage/resources over time.
As you can see I'm visualizing the integration Runtime itself (CPU/Memory) as well as the Azure VM that is hosting the Integration Runtime. On top of this you can go into the Metrics dashboard to set up alerts if certain conditions are met (eg AVG CPU % usage is over 75% for the last 15 minutes). These alerts can send you a text message, or email... and even do things as complicated as triggering a LogicApp or WebHook for automated scaling up/out, advanced notifications, etc.
This in my opinion is the best way to monitor but another option could be to call the Azure Data Factory REST API to get monitor data for the Integration Runtimes
POST https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.DataFactory/factories/{factoryName}/integrationRuntimes/{integrationRuntimeName}/monitoringData?api-version=2018-06-01
But this method would require you to incrementally pull in data, store it, parse it, and then visualize it or act upon it when that is already very well built in for you. Sometimes it's fun to recreate wheels though.
Yes It is possible to Monitor Azure Integration Runtime.
"Pipeline Runs" in Monitoring has the option to check the CPU Utilization specific to pipeline, Integration Runtime and more specific filters. You can find here, how its done.

Alternate to run window service in Azure cloud

We currently have a window service which send some notification emails to users after doing some processing on database(SQL database). Runs once in day.
We want to move this on azure cloud. One alternate is to put it on Azure VM as is. but I am finding some other best possible solution for that.
I study about recurring and on demand Web jobs but I am not sure is this is best solution.
Also is there any possibility to update configuration of service code in App.config without re-deploy the code of service on cloud. I means we can manage configuration from Azure portal.
Thanks in advance.
Update 11/4/2016
Since this was written, there are 2 additional features available in Azure that are both excellent choices depending on what functionality you need:
Azure Functions (which was based on the WebJobs described below): Serverless code that can be trigger/invoked in various ways, and has scaling support.
Azure Service Fabric: Microservice platform, with support for actor model, stateful and stateless services.
You've got 3 basic options:
Windows service running on VM
WebJob
Cloud service
There's a lot of information out there on the tradeoffs between these choices, but here's a brief summary.
VM - Advantages: you can move your service basically as it is without having to change much or any of your code. They also have the easiest connectivity with other resources in Azure (blob storage, virtual networks, etc). The disadvantage is you're giving up all the of PaaS advantages and are still stuck managing your own VM infrastructure
WebJob - Advantages: Multiple invocation options (queues, blobs, manually, queue receive loops, continuous while-loop style, etc), scheduled (would cover your case). Easy to deploy (can go with website, as a console app, automatically through Kudu), has some built in logging in Azure portal - and yes, to answer your question, you can alter the configuration in the portal itself for connection strings and app settings.
Disadvantages - you'll need to update code, you don't have access to underlying resources (if you need that), and more of something to keep in mind than a disadvantage - it uses the same resources as the webapp it's deployed with.
Web Jobs are the newest of the options, but at the same time appear to have active development going on to increase the functionality and usefulness.
Cloud Service - like a managed VM, has some deployment options, access to underlying VM if needed. Would require some code changes from your existing service.
There's nothing you've mentioned in your use case that makes me think a Web Job shouldn't be first thing you try.
(Edit: Troy Hunt has a great and relatively recent blog post illustrating most of the points I've mentioned about Web Jobs above: http://www.troyhunt.com/2015/01/azure-webjobs-are-awesome-and-you.html)

How to build a auto-scale Azure Cloud Service based on network usage?

Azure Cloud Services have auto-scale based on CPU / Queue. We have a set of machines running API for uploading and processing files. Although we moved the processing part on Worker Role that scale depending on the queue size, the servers but also take care of the upload while responding to other operations like downloading.
Right now we're using more machines for the just in case scenario, but we want to build a way to scale and to be cost-efficient while having a great upload experience for our users.
What would your approach be for creating a way to detect the network usage across all machines from the same Cloud Service and auto-scale if necessary?
I would:
1) Create metrics that calculate the amount of time it takes to download/upload a file.
2) Aggregate the metrics in some persistence layer (we have plenty in Azure)
3) Create a service that looks those metrics
4) Check the thresholds
5) Use the Management Libraries for .NET to trigger scaling on the Cloud Service(s) affected.
This approach also scales with your solution. You can eventually separate the scaling part from the checking part and have them as two different services, communicate asynchronously.
We also have an old, open source now project that does some of that for you, so you don't have to reinvent the wheel. It's called WASABi. Be careful though as this is not maintained anymore but as I said, you can use it as inspiration.

Logging and tracing on Azure

We are looking a solution for logging and tracing for our multi-tenant application with distributed architecture, that will be hosted on Azure.
We have already gone through these two articles – Troubleshooting Best Practices for Developing Windows Azure Applications and Enabling Diagnostics in Windows Azure. Is there anything other better solution?
We would like to know
• what are the best practices and approach for it?
o Storage strategy?
• Any third party / open source tool that helps us for the same?
EDIT:
We are looking for two things:
Best practice for storage strategy, where should we store log data? Since it's multi-tenant multi-tier application, should we keep data separate for each tier per tenant, combine them or any better solution? How do we store the data so that we can trace single request individually that spanned across multiple tiers?
A tool that helps us to view trace data, analyse them, filter, sort, etc. Since size of trace data will be comparatively huge, trace a flow of single task that spanned across multiple tiers.
I have used System.Diagnostic with XML listener, in on-premise application - with multiple tiers (web app, service layer 1, service layer 2, etc). I then, used Microsoft Service Trace Viewer to view the log data. SVCTraceViewer supports many features including combining log files of many tier, graphical representation, tracing individual request, etc.
So, some thing similar third party / open source tool for Azure. That also helps support engineer to drill down the issue and resolve it.
I would recommend looking into an open source library like log4net. It provides a pluggable/fully configurable and super flexible way to log messages with a lot of custom data and to a lot of sources. Configuration for it can be retrieved from external sources/xml, code, config files, etc.
You can create your own appender for Table Storage or find someone else's
HTH

Pulling data asynchronously from third-party web service on Windows Azure Platform

I want to pull large amount of data, frequently from different third party API web services and store it in a staging area (this is what I want to decide right now) from where it will be then moved one by one as required into my application's database.
I wanted to know that can I use Azure platform to achieve the above? How good is it to use Azure platform for this task?
What if the data to be pulled is of large amount and the frequency of the pull is high i.e. may be half-hourly or hourly for 2,000 different users?
I assume that if at all this is possible, then the bandwidth, data storage and server capability etc. will not be a thing to worry for me but for ©Microsoft. And obviously, I should be able to access the data back whenever I need it.
If I would have to implement it on Windows Servers, then I know that I would use a windows service to do this. But I don't know how it can be done for Windows Azure Platform if at all it is possible?
As Rinat stated, you can use Lokad's solution. If you choose to do it yourself, you can run a timed task in your worker role - maybe spawn a thread that sleeps, waking every 30 minutes to perform its task. It can then reach out to the Web Services in question (or maybe one thread per Web Service?) and fetch data. You can store it temporarily in Azure Table Storage, which is a fraction of the cost of SQL Azure (0.15 per GB), and then easily read it out of Table Storage on-demand and transfer to SQL Azure.
Assuming you host your services, storage and SQL Azure are in the same data center (by setting the affinity appropriately), you'd only pay for bandwidth when pulling data from the web service. There'd be no bandwidth charges to retrieve from Table Storage or insert into SQL Azure.
In Windows Azure that's usually Worker Role used to host the cloud processing. In order to accomplish your tasks you'll either need to implement this messaging/scheduling infrastructure yourself or use something like Lokad.Cloud or Lokad.CQRS open source projects for Azure.
We use Lokad.Cloud for distributed BI processing of hundreds of thousands of series and Lokad.CQRS allows to reliably retrieve and synchronize millions of products on schedule.
There are samples, docs and community in both projects to get you started.

Resources