Converting FTP data sync to Azure services

Converting FTP data sync to Azure services - azure

I have an old legacy application built on .NET remoting, and transferring data via XML via with FTP.
Esentially, a CRM system is sending XML files to a directory on the web server, which has a windows service that uses a filewatcher to process the incoming XML file, updating the database.
Similarly, changes on the web application serialize down into an XML file into an out folder, that the CRM polls via FTP every 5 minutes.
Trying to map the best services to convert this to for Azure.

You could use Azure Blobs or Azure Files for this.
Azure Blobs: This is the lowest cost option, while still providing high throughput. However, note that Azure Blobs do not have File Watcher functionality, so you would have to poll the directory every few minutes to check for a new file. If you delete files after processing them, then this is really easy - all you have to do is list and see if there are any files. If you want to retain the files, then you might have to do more, since the file list will get big over time. Let me know if this is the case and I can suggest some options.
Azure Files: This is an SMB share that you can mount from a VM in the same region. This will map pretty closely to your exising filesystem based code, including FileWatcher. However, note that Azure Files can only be mounted by a VM in the same region.

Related

Azure Back Ground Services For File Processing

We currently have Window service to process Inbound/outbound files.
In Bound files we read data and perform some calculations and store data in Database.
Out Bound files we generate data from the database.
We want to migrate to azure now. I have following questions .
1) what is the best way to store files in azure (Blob or File Share in azure) . We have only ".pdf,.txt,.xlsx" formats no videos
2) Which process is better to process files - WebJobs, Virtual Machine and install window service , Azure Batch Jobs, azure kubernetes service,Service Fabric.
Please some can help me on this.
Thanks

How are you receiving the files API, FTP or some other way? There are a ton of details that are needed to really answer this, but here are my thoughts.
Blob storage would be more cost effective. You only need to use a file share if you want to be able to map a network drive from a VM.
If processing one file would complete in less than 10 minutes I would look at Azure functions for that. If you’re processing thousands of files per day Azure functions would be expensive so I would look at running them on an App Service on VMs or moving to Service Fabric.
If you have a web site that’s used to upload the files and you’re already using Azure App service then you could use Web Jobs.

Are generated files persisted on Azure?

I have an Azure Web App, which will generate pdf files at runtime and write them to disk. Can I trust that these files will be persisted?
I am concerned that if my image is spun down and brought back up again then the files might have disappeared.
Or perhaps Azure decides to move the website to a different machine or different datacentre, where these files would not exist.
I know there are cloud based options such as blob storage, but I would prefer the simplicity of writing to disk and having access over FTP.

Anything that you write under the d:\home folder is guaranteed to be persisted. See the File System section in this for more details on this topic.

Azure Architecture Design

I'm new to Azure, and a little confused about blob storage. I have a need for clients to access via FTP / SFTP to push and pull files (XML, CSV, EDI, etc). The pushed files are read in by a .net application and written to a database. As I understand, we would use a VM role to create a FTP / SFTP server, a worker role to execute the .net code, SQL Storage for the DB and Blob storage for the files.
Am I correct in this assumption first, and second can a VM role attach a storage blob for writing and reading files and can a worker role attach to the same storage blob to read and write files as well.
Sample:
client pushed xml file to VM via FTP. VM writes XML file to storage. Worker role reads file, processes it and writes contents to db.
Is my thinking correct or am I missing the boat?
Thanks

Given Azure has an array of services so you have a few options. One important thing to keep in mind with Azure is that your worker roles, which are simply Windows Server 2008 without IIS installed, are very flexible so there is a lot you can do with them – this includes writing your own FTP server and being able to host it via a worker role VMs. The FTP to Azure Blob Storage Bridge (on CodePlex) solution is an example of this.
In addition, you could use a web role (which is the same as a worker role but with IIS enabled) to do the same - so rather than rolling your own FTP server you can use IIS. A visual guide to setting IIS up to run as an FTP server in Azure can be found on ITQ.
I’d recommend doing some further reading to determine which is the better option of the two. Also have a think about you requirements as this may influence your approach, i.e. scaling, bandwidth, costs, your preferred deployment model etc.
As far as storing the files goes you can certainly use Blob Storage. If you have no need for a relational database in your system then you could skip using SQL Azure altogether (in which case the web role solution referenced above won’t be of much use) – but again that comes down to your particular requirements.
The official Windows Azure website is a good source of knowledge, especially if you’re getting started, so do take the time to look through some of the pertinent documentation.

Backup Azure Virtual Machine local folders to blob storage?

I've just setup an extra small VM instance in Windows Azure to run a help console for our company. The help files can be updated and published through a simple .NET interface. Obviously the flat html files are getting deployed to the local drive on the VM and exposed publicly through IIS. I'm just wondering how stable this is? If the VM suffers a hardware failure, presumably there's no automatic failover and any edits we've made to the help system will be lost?
Can anyone recommend a way I can shuttle the source files out of the VM into blob storage? I could write a an application to do this, I'm just wondering if there is an out-of-the-box solution out there?
Additional information:
The VM instance is running Server 2008 R2 SP1 (As a Virtual Machine not a web-role)
A backup needs to be created once every 24 hours
Aged backups (3+ days old) need to be automatically cleared from the blob container
The help system we use is called HelpConsole 2012
New pages are added at a rate of myabe 2-3 per week

The answer depends on how whether you are running this in a Windows Azure Virtual Machine or on a Windows Azure Web role.
If you are running this on a Windows Azure Virtual Machine, then the VHD is stored in BLOB storage and, if the site is running of the C: drive and not on a data Disk, then the system has some Host caching turned on for both reads and writes. In this scenario it is possible (depending on the methods you use to write your files out) that the data is not pushed back to the VHD in BLOB storage before a failure occurs. You can either ensure that your writing methods do a write through operation, or turn off the write caching. Better yet, attach a data disk for your web site files. By default data disks have both read and write caching off (you could turn on read caching). Since the VHD's are persisted you don't have to worry about the concern of the edits getting lost. You can script out taking a snapshot of the files and move them to BLOB storage separately, or even push them somewhere else. Another thing to think about with this option is that you have to care for the VM instances and keep them patched and up to date.
If you are running a Web Role, then yes, if a failure occurs and the VM goes through self healing it will indeed redeploy with the older files. In this case I'd recommend changing the code in the web role that when it writes the updates to the local file it also puts a copy of the local file into BLOB Storage. In addition, in the web role OnStart you could reach out to BLOB storage and pull down all the new content locally. BE VERY CAREFUL with this approach though because it only really works well for ONE instance, not multiple. If you plan on running multiple instances of the server (and you will have to if you want the SLA for uptime) then your code will need to be a little more robust and do writes out to BLOB storage and then alert all instances of the role that there is a new file to pull down locally.
Another option for web roles is to also write a handler for the content so that requests come in and are mapped to a file BLOB Storage directly. Then updates can occur to direct edits to the file in BLOB storage. This offloads the serving of the flat files from your compute nodes to BLOB storage and you could even implement some caching and stream the content back through the handler rather than having them hit BLOB storage directly if you wanted to.
Now, another option, is to use Windows Azure Web Sites for this. The underlying storage of the web site files in Windows Azure Web Sites is a shared location and thus updating the files in it will immediately be reflected for all instances. Also, the content for the site is stored in BLOB storage and can be updated via FTP, source control, or directly from code. Lots of options here. You may end up moving to reserved instances to help keep away from some of the quotas that Web Sites have. Web Sites may not be an option for you currently depending on other requirements (as in how much control do you need over the environment since you don't get a lot of control for Web Sites).

Azure WCF accessing disk files

I have a WCF service hosted on Windows Azure as a "cloud service." When the service starts, it needs to populate data from files/disk to its memory so it is accessed fast (cached in other words). Right now I'm using like C:\Documents\Filestoprocess folder so that the WCF calls the folder and populates data data in that folder in its memory. I have like 5,000 small files. How do I do this in Azure? Is there a folder path that I can call within the WCF so that the WCF calls these files and opens each files and saves each data in the files? I'm not really looking for complicated Blob access through network using bandwidth. I'm looking for simple disk I/O access to these files from the WCF "cloud service" that is running on its own public web address.

You should try to use a cloud storage service to store data, as if you write to the local file system it can get destroyed on a restart of the service or recycling of the service.
You can look into using the azure drive service, which is like creating a disk dive. It is on top of blob storage.
But if you really want to write and read data on the local file system check out this blog post http://blog.codingoutloud.com/2011/06/12/azure-faq-can-i-write-to-the-file-system-on-windows-azure/
It talks about setting up your service definition to allow writing to the local file system.

Depending on the size of your instances you'll get a non-presistent disk where you can store this kind of temporary data. The minimum is 20GB for an extra small instance. You shouldn't access the disk directly, but you need to use a local resource instead which you can configure in your service definition file or in Visual Studio (double click your Web / Worker Role).
This storage is non-persistent, this means if you delete your deployment, if you decrease the number of instances, in case of hardware problems, ... you loose all data saved here. If you want to persist your files you should use blob storage instead. But in your case, where you need the files as some kind of caching mechanism, local resources are perfect.
And if your goal is to cache data you might want to take a look at the caching features included in Windows Azure: Caching in Windows Azure

Blob access is not complex. In fact, you could do a single download of a zip file from blob storage to local disk, unzip it, then prime your wcf service from those 5,000 small files.
Check out this msdn page documenting DownloadBlobToFile(). The essential parts:
CloudBlobClient blobClient =
new CloudBlobClient(blobEndpoint, new StorageCredentialsAccountAndKey(accountName, accountKey));
// Return a reference to the blob.
CloudBlob blob = blobClient.GetBlobReference("mycontainer/myblob.txt");
// Download the blob to a local file.
blob.DownloadToFile("c:\\mylocalblob.txt");
Now: I don't agree with saving to the root folder on C:. Rather, you should grab some local storage (easily configurable). Once you configure local storage in your role configuration, just ask the role environment for it, and ask for root path:
var localResource = RoleEnvironment.GetLocalResource("mylocalstorage");
var rootPath = localResource.RootPath;
Note: As #KingPancake mentioned, you could use an Azure drive. However: remember that an Azure drive can only be writeable by one instance. You'd need to make additional snapshots for your other instances. I think it's much simpler for you to go with a simple blob, copy your files down (either as single zip or individual files), and go from there.
You mentioned concern with network+bandwidth. You don't pay for bandwidth within the same data center. Also: It's extremely fast: 100Mbps per core. So even with a Small instance, you'll have your files copied down very quickly, moreso when you go to larger instance sizes.
One last thought: The only other ways to gain access to your 5,000 files, without using blob storage or Azure Drives (which are mounted as vhd's in blob storage) would be to either download the files from an external source or bundle them with your Windows Azure package (and then they'd show up in your app's folder, under whatever subfolder you stuck them in). Bundling has two downsides:
Longer time to upload your deployment package due to added size
Inability to change any of the individual files without redeploying the package.
By storing in a blob, you can easily change one (or all) of your small files without redeploying your code - you'd just need to signal it to either re-read from blob storage or restart the instances so they automatically download the new files.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string