Azure Storage - File Share - Move 16m files in nested folders

Azure Storage - File Share - Move 16m files in nested folders - azure

Posting here as server fault doesn't seem to have the detailed Azure knowledge.
I have a Azure storage account, a file share. This file share is connected to a Azure VM through mapped drive. A FTP server on the VM accepts a stream of files and stores them in the File Share directly.
There are no other connections. Only I have Azure admin access, limited support people have access to the VM.
Last week, for unknown reasons 16 million files, which are nested in many sub-folders (origin, date) moved instantly into a unrelated subfolder, 3 levels deep.
I'm baffled how this can happen. There is a clear instant cut off when files moved.
As a result, I'm seeing increased costs on LRS. I'm assuming because internally Azure storage is replicating the change at my expense.
I have attempted to copy the files back using a VM and AZCOPY. This process crashed midway through leaving me with a half a completed copy operation. This failed attempt took days, which makes me confident I wasn't the support guys dragging and moving a folder by accident.
Questions:
Is it possible to just instantly move so many files (how)
Is there a solid way I can move the files back, taking into account the half copied files - I mean an Azure backend operation way rather than writing an app / power shell / AZCOPY?
So there a cost efficient way of doing this (I'm on Transaction Optimised tier)
Do I have a case here to get Microsoft to do something, we didn't move them... I assume something internally messed up.
Thanks

A tool that supports server-side copy (like AzCopy) can move the files quickly because only the metadata is updated. If you wants to investigate the root cause, I recommend opening a support case. (To sort this out – Your best bet is to connect with our Azure support team by filing a ticket, our support team on best effort basis can help you guide on this matter. )

Related

Using the temp directory for Azure Functions

I have a set of Azure functions running on the same host, which scales up to many instances at times. I'd like to store a very small amount of ephemeral data (a few kb's) and opportunistically share those data between function executions. I know that the temp directory is only available to the functions running on that same instance. I also know that I could use the home directory, durable functions, or other Azure (such as blob) storage to share data between all functions persistently.
I have two main questions
What are the security implications of using the temp directory? Who can access its contents outside of the running function?
Is this still a reasonable solution? I can't find much in the way of Microsoft documentation outside of what looks like some outdated kudu documentation here.
Thanks!

Answer to Question 1
Yes, it is secure. The Function host process runs inside a sandbox. All access data stored to D:\local is self-contained and isolated to the processes within the sandbox. Kindly see https://github.com/projectkudu/kudu/wiki/Azure-Web-App-sandbox
Answer to Question 2
The data in D:\local\Temp exists as long as the Function host process is alive. The Functions host process can be recycled at any time due to unexpected events such as unhandled exceptions, timeouts, hitting resource usage limits for your plan. As long as your workflow accounts for the fact that the data stored in D:\local\Temp is ephemeral, then the answer is a 'yes'.

I believe this will answer your question :
Please refer to this for more details.
Also, when Folder/Files when created via code inside the “Temp” folder; you cannot view them when you visit KUDU site. But you can use those files/ folders.
How to view the files/ folders if created via KUDU?
We will need to add - WEBSITE_DISABLE_SCM_SEPARATION = true in Configuration(app settings).
Note:- Another important note is that the Main site and the scm site do not share temp files. So if you write some files there from your site, you will not see them from Kudu Console (and vice versa).
You can make them use the same temp space if you disable separation (via WEBSITE_DISABLE_SCM_SEPARATION).
But note that this is a legacy flag, and its use is not recommended/supported.
(ref : shared document link)

Security implications depend on the level of isolation you are seeking.
In shared app-service plan or consumption plan you need to trust the sandbox isolation. This is not an isolated microvm like AWS lambda.
If you have your own app-service plan then you need to trust the VM hypervisor isolation of your app-service plan.
If you are really paranoid or running healtcare application, then you likely need to run your function in a ASE plan.
Reasonable solution is one where the cost is not exceeding the worth of data you are protecting :)

Automated processing of large text file(s)

The scenario is as follows: A large text file is put somewhere. At a certain time of the day (or manually or after x number of files), a Virtual Machine with Biztalk installed is about to start automatically for processing of these files. Then, the files should be put in some output place and the VM should be shut down. I don´t know the time it takes for processing these files.
What is the best way to build such a solution? The solution is preferably to be used for similar scenarios in the future.
I was thinking of Logic Apps for the workflow, blob storage or FTP for input/output of the files, an API App for starting/shutting down the VM. Can Azure Functions be used in some way?
EDIT:
I also asked the question elsewhere, see link.
https://social.msdn.microsoft.com/Forums/en-US/19a69fe7-8e61-4b94-a3e7-b21c4c925195/automated-processing-of-large-text-files?forum=azurelogicapps

Just create an Azure Runbook with a Schedule, make that Runbook check for specific files in a Storage Account, if they exist, start up a VM and wait till the files are gone, once the files are gone (so BizTalk processed them, deleted and put them in some place where they belong), Runbook would stop the VM.

IIS Virtual Directory in Azure

I've been told that you can create virtual directories in IIS hosted on Azure but I'm struggling to find any info on this as its a relatively new feature. I'd like to point the virtual directory to an Azure Drive (XDrive, NTFS Drive) so that I can reference resources on the drive.
I'm migrating an on premise website onto Azure and need to minimise the amount of rework / redevelopment required. Currently the website has access to shared content folders and I'm trying to mimic a similar set up due to tight time scales.
Does anyone have any knowledge of this or pointers for me as I can't find any information on how to do this?
Any information / pointers you have would be great
Thanks
Steve

I haven't had a moment to check myself, but get the latest copy of the Windows Azure Platform Training kit. I'm fairly certain that it has a hands on lab that demonstrates the new feature. However, I do not believe that lab includes creating a virtual directory on a azure drive. Even if you can point it there, you may run into some .NET security limitations. http://www.microsoft.com/downloads/en/details.aspx?FamilyID=413e88f8-5966-4a83-b309-53b7b77edf78&displaylang=en
Another resource to look into might be the stuff Cory Fowler is doing http://blog.syntaxc4.net/ He's been spending some time of late really digging into the internals of the new 1.3 roles. So he might be able to lend you a hand.

I've been kicking this issue around for sometime now and I can upload a VHD to Azure and I can create a virtual directory in Azure that points to a physical location on my pc (when running in Dev fabic) and here is the but....
I can't find any examples on where I can do both at the same time, i.e. mount a drive and then map a virtual directory to it.
I've had a look in the 1.3 SDK and looked at various blogs but I can't see any pointers on this - I guess I may have got hold of the wrong end of the stick. If anyone knows how or if this can be done, that would be great.

Should DFS be used to sync wwwroot?

I'm wondering if it's a good idea to use DFS to sync content across a web farm? Does anyone have any experience of this? We've used Robocopy in the past but found it a little patchy and clunky.
Essentially we want to avoid having to make ten changes to content each time one file changes (this happens a lot since our site is old and usues classic ASP.)
From what I gather, DFS is usually meant for geographically seperated locations and used to make the UNC shares appear simpler to users and easier to manage.
What I'd like to achieve with it is to only copy content changes to one of ten servers which will be the hub. I'd then configure the other nine servers as spokes using FRS.
Any thoughts on this methodology or suggestions for better setups would be much appreciated.

For performance reasons, don't point a web site to a UNC path. SMB file access is horribly inefficient and slow compared to pretty much any other file access method.
You can use DFS-R (via Windows 2003 R2) to enable replication between DFS-enabled shares, but definitely setup IIS to point to the share's local path, not UNC.

If you're using Win2003 make sure to install R2, DFS replication is much improved and doesn't use FRS. It will do what you want, even over a LAN.

Don't use FRS for this, it may get confused. Using DFS and another sync technique, such as Symantec Replication Exec, works fine. Make sure to create the correct site structure with IP ranges in Active Directoy so that the correct servers are chosen by DFS.
I tried that some years ago with FRS, when Windows 2003 was new (before SP1, things may have become better since then, but I'm not sure). FRS twice completely went nuts and deleted our files, not to talk from the amount of times it just clogged up and failed to recover itself. FRS also does only sync files which are closed, files which are left open are not synched (when doing log file collection for instance). FRS is fine in environments where you have a moderate amount of relatively small files with not too many changes going on on the server.

I have very recently disabled the UNC DFS as the site root on a server; under heavy load the site would become unresponsive to requests. Pointing the site wwwroot to a local drive and restarting IIS quickly restored the site speed. I have to recommend that if you go the DFS route, simply have it replicate to a local drive instead of using the UNC path as the wwwroot.

Good Secure Backups Developers at Home [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
What is a good, secure, method to do backups, for programmers who do research & development at home and cannot afford to lose any work?
Conditions:
The backups must ALWAYS be within reasonably easy reach.
Internet connection cannot be guaranteed to be always available.
The solution must be either FREE or priced within reason, and subject to 2 above.
Status Report
This is for now only considering free options.
The following open-source projects are suggested in the answers (here & elsewhere):
BackupPC is a high-performance,
enterprise-grade system for backing
up Linux, WinXX and MacOSX PCs and
laptops to a server's disk.
Storebackup is a backup utility
that stores files on other disks.
mybackware: These scripts were
developed to create SQL dump files
for basic disaster recovery of small
MySQL installations.
Bacula is [...] to manage
backup, recovery, and verification
of computer data across a network of
computers of different kinds. In
technical terms, it is a network
based backup program.
AutoDL 2 and Sec-Bk: AutoDL 2
is a scalable transport independant
automated file transfer system. It
is suitable for uploading files from
a staging server to every server on
a production server farm [...]
Sec-Bk is a set of simple utilities
to securely back up files to a
remote location, even a public
storage location.
rsnapshot is a filesystem
snapshot utility for making backups
of local and remote systems.
rbme: Using rsync for backups
[...] you get perpetual incremental
backups that appear as full backups
(for each day) and thus allow easy
restore or further copying to tape
etc.
Duplicity backs directories by
producing encrypted tar-format
volumes and uploading them to a
remote or local file server. [...]
uses librsync, [for] incremental
archives
simplebup, to do real-time backup of files under active development, as they are modified. This tool can also be used for monitoring of other directories as well. It is intended as on-the-fly automated backup, and not as a version control. It is very easy to use.
Other Possibilities:
Using a Distributed Version Control System (DVCS) such as Git(/Easy Git), Bazaar, Mercurial answers the need to have the backup available locally.
Use free online storage space as a remote backup, e.g.: compress your work/backup directory and mail it to your gmail account.
Strategies
See crazyscot's answer

I prefer http://www.jungledisk.com/ .
It's based on Amazon S3, cheap, multiplatform, multiple machines with a single license.

usb hard disk + rsync works for me
(see here for a Win32 build)

Scott Hanselman recommends Windows Home Server in his aptly titled post
The Case of the Failing Disk Drive or Windows Home Server Saved My Marriage.

First of all: keeping backups off-site is as important for individuals as it is for businesses. If you house burns down, you don't want to loose everything.
This is especially true because it is so easy to accomplish. Personally, I have an external USB harddisk I keep at my fathers house. Normally, it is hooked up to his internet connections and I backup over the net (using rsync), but when I need to backup really big things, I collect it and copy things over USB. Ideally, I should get another disk, to spread the risk.
Other options are free online storage facilities (use encryption!).
For security, just use TrueCrypt. It has a good name in the IT world, and seems to work very well.

Depends on which platform you are running on (Windows/Linux/Mac/...?)
As a platform independent way, I use a personal subversion server. All the valuables are there, so if I lose one of the machines, a simple 'svn checkout' will take things back. This takes some initial work, though, and requires discipline. It might not be for you?
As a second backup for the non-svn stuff, I use Time Machine, which is built-in to OS X. Simply great. :)

I highly recommend www.mozy.com. Their software is easy and works great, and since it's stored on their servers you implicitly get offsite backups. No worrying about running a backup server and making sure it's working. Also, the company is backed by EMC (a leading data storage product company), so gives me enough confidence to trust them.

I'm a big fan of Acronis Trueimage.Make sure you rotate through a few backup HDDs to you have a few generations to go back to, or if one of the backups goes bang. If it's a major milestone I snail-mail a set of DVDs to Mum and she files em for me. She lives in a different state so it should cover most disasters of less-than-biblical proportions.
EDIT: Acronis has encryption via a password. I also find the bandwidth of snailmail to be somewhat infinite - 10GB overnight = 115 kb/s, give or take. Never been throttled by Australia Post.

My vote goes for cloud storage of some kind. The problem with nearly all 'home' backups is they stay in the home, that means any catastrophic damage to the system being backed up will probably damage the backups as well (fire, flood etc). My requirements would be
1) automated - manual backups get forgotten, usually just when most needed
2) off-site - see above
3) multiple versions - that is backup to more than one thing, in case that one thing fails.
As a developer, usually data sizes for backup are relatively small so a couple of free cloud backup accounts might do. They also often fulfil part 1 as they can usually be automated. I've heard good things about www.getdropbox.com/.
The other advantage of more than 1 account is you could have one on 'daily sync' and another on 'weekly sync' to give you some history. This is nowhere near as good as true incremental backups.
Personally I prefer a scripted backup (to local hard-drives, which I rotate to work as 'offsites'. This is in large part due to my hobby (photography) and thus my relatively lame internet upstream bandwith not coping with the data volume.
Take home message - don't rely on one solution and don't assume that your data is not important enough to think about the issues as deeply as the 'Enterprise' does.

Buy a fire-safe.
This is not just a good idea for storing backups, but a good idea period.
Exactly what media you put in it is the subject of other answers here.
But, from the perspective of recovering from a fire, having a washable medium is good. As long as the temperature doesn't get too high CDs and DVDs seem reasonably resilient, although I'd be concerned about smoke damage.
Ditto for hard-drives.
A flash drive does have the benefit that there are no moving parts to be damaged and you don't need to be concerned about the optical properties.

mozy.com is king. I started using it just to backup code and then ponied up the 5 bux a month to backup my personal pictures and other stuff that I'd rather not lose if the house burns down. The initial backup can take a little while but after that you can pretty much forget about it until you need to restore something.

Get an external hard drive with a network port so you can keep your backups in another room which provides a little security against fire in addition to being a simple solution you can do yourself at home.
The next step is to get storage space in some remote location (there are very cheap monthly prices for servers for example) or to have several external hard drives and periodically switch between the one at home and a remote location. If you use encryption, this can be anywhere such as a friend's or parents' place or work.

Bacula is a good software, it's open source, and shall give good performance, kind of commercial software, a bit difficult the first time to configure, but not so hard. It has good documentation

I second the vote for JungleDisk. I use it to push my documents and project folders to S3. My average monthly bill from amazon is about 20c.
All my projects are in Subversion on an external host.
As well as this, I am on a Mac, so I use SuperDuper to take a nightly image of my drive. I am sure there are good options in the Windows/Linux world.
I have two external drives that I rotate on a weekly basis, and I store one of the drives off-site during it's week off.
This means that I am only ever 24 hours away from an image in case of failure, and I am only 7 days from an image in case of catastrophic failure (fire theft). The ability to plug the drive in to a machine and be running instantly from the image has saved me immensely. My boot partition was corrupted during a power failure (not a hardware failure, luckily). I plugged the backup in, restored and was working again in the time it took to transfer the files of the external drive.

Another vote for mozy.com
You get 2gb for free, or for $5/month gives you unlimited backup space. Backups can occur on a timed basis, or when your PC/Mac is not busy. It's encrypted during transit and storage.
You can retrieve files via built in software, through the web or pay for a DVD to be burned and posted back.
William Macdonald

If you feel like syncing to the cloud and don't mind the initial, beta, 2GB cap, I've fallen in love with Dropbox.
It has versions for Windows, OSX, and Linux, works effortlessly, keeps files versioned, and works entirely in the background based on when the files changed (not a daily schedule or manual activations).
Ars Technica and Joel Spolsky have both fallen in love (though the love seems strong with Spolsky, but lets pretend!) with the service if the word of a random internet geek is not enough.

These are interesting times for "the personal backup question".
There are several schools of thought now:
Frequent Automated Local Backup + Periodic Local Manual Backup
Automated: Scheduled Nightly backup to external drive.
Manual: Copy to second external drive once per week / month / year / oops-forgot
and drop it of at "Mom's house".
Lot's of software in the field, but here's a few: There's RSync and TimeMachine on Mac, and DeltaCopy www.aboutmyip.com/AboutMyXApp/DeltaCopy.jsp for Windows.
Frequent Remote Backup
There are a pile of services that enable you to backup across you internet connection to a remote data centre. Amazon's S3 service + JungleDisk's client software is a strong choice these days - not the cheapest option, but you pay for what you use and Amazon's track record suggests as a company it will be in business as long or longer than any other storage providers who hang their shingle today.
Did I mention it should be encrypted? Props to JungleDisk for handling the "encryption issue" and future-proofing (open source library to interoperate with Jungle Disk) pretty well.
All of the above.
Some people call it being paranoid ... others think to themselves "Ahhh, I can sleep at night now".
Also, it's more fault-tolerance than backup, but you should check out Drobo - basically it's dead simple RAID that seems to work quite well.

Here are the features I'd look out for:
As near to fully automatic as possible. If it relies on you to press a button or run a program regularly, you will get bored and eventually stop bothering. An hourly cron job takes care of this for me; I rsync to the 24x7 server I run on my home net.
Multiple removable backup media so you can keep some off site (and/or take one with you when you travel). I do this with a warm-pluggable SATA drive bay and a cron job which emails me every week to remind me to change drives.
Strongly encrypted media, in case you lose one. The linux encrypted device support (cryptsetup et al) does this for me.
Some sort of point-in-time recovery, but consider carefully what resolution you want. Daily might be enough - having multiple backup media probably gets you this - or you might want something more comprehensive like Apple's Time Machine. I've used some careful rsync options with my removable drives: every day creates a fresh snapshot directory, but files which are unchanged from the previous day are hard linked instead of copied, to save space.

Or simply just set up a gmail account and mail it to yourself :) Unless you're a bit paranoid about google knowing about your stuff since you said research. It doesn't help you much with structure and stuff but it's free, big storage and off-site so quite safe.

If you use OS X 10.5 or above then the cost of Time Machine is the cost of an external hard drive. Not only that, but the interface is dead simple to use. Open the folder you wish to recover, click on the time machine icon, and browse the directory as if it was 1999 all over again!
I haven't tried to encrypt it, but I imagine you could use truecrypt.
Yes this answer was posted quite some time after the question was asked, however I believe it should help those who stumble across this posting in the future (like I did).

Setup a Linux or xBSD server:
-Setup a source control system of your choice on it.
--Mirror Raid (raid 1) at min
--Daily (or even hourly) backups to external drive[s].
From the server you could also setup an automatic offsite backup. If the internet is out, you'd still have your external drive and just have it auto sync once it comes back.
Once it's setup it should be about 0 work.
You don't need anything "fancy" for offsite backup. Get a webhost that allows storing non-web data. sync via sftp or rsync over ssh. Store data on other end in true crypt container if your paranoid.
If you work for an employeer/contractor also ask them. Most places already have something in place or let you work with their IT.

My vote goes to dirvish (for linux). It uses rsync as backend but is very easy to configure.
It makes automatic, periodically and differential backups of directories. The big benefit is, that it creates hardlinks to all files not changed since the last backup. So restore is easy: Just copy last created directory back - instead of restoring all diffs one after another like other differential backup tools need to do.

I have the following backup scenarios and use rsync scripts to store on USB and network shares.
(weekly) Windows backup for "bare metal" recovery
Content of System drive C:\ using Windows Backup for quick recovery after physical disk failure, as I don't want to reinstall Windows and applications from scratch. This is configured to run automatically using Windows Backup schedule.
(daily and conditional) Active content backup using rsync
Rsync takes care of all changed files from laptop, phone, other devices. I backup laptop every night and after significant changes in content, like import of the recent photo RAWs from SD card to laptop.
I've created a bash script that I run from Cygwin on Windows to start rsync: https://github.com/paravz/windows-rsync-backup

If you're using deduplicaiton STAY AWAY from JungleDisk. Their restore client makes a mess of the reparsepoint, and makes the file unusable. You hopefully can fix it in safe mode with:
fsutil reparsepoint delete

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string