I've noticed that Azure blob storage now has the option to encrypt your data at rest.
There's no financial cost to this as far as I can tell and there's no indication/documentation anywhere that states how much of a performance impact this will have on access speeds (If any).
My question is, is there a good reason to not turn it on in most cases?
I imagine that if you've got a scenario where every millisecond counts and security isn't an issue (Such as for public containers perhaps), then maybe you might not want to, but otherwise it sounds like a nice feature to turn on for free (No such thing as a free lunch, but I can't find evidence of a downside beyond speculation).
If you enable SSE for your blobs, there is no distinguishable impact on performance.
The only case where you might think twice about it is if the storage account only holds VHD files that are being used by VMs. It is better to use Azure Disk Encryption for VMs, which uses DMCrypt for Linux VMs and Bitlocker for Windows VMs. Bitlocker, for example, will go back and encrypt everything on the disk.
SSE only encrypts newly written data. This means if you have a storage account with 100 GB of data and you enable SSE, that 100 GB of data is not encrypted unless you copy it to another container or something like that so it will encrypt it. However, any blobs added to that storage account WILL be encrypted.
Related
We are going to have a new business system and I'm trying to convince my boss to host it on cloud in China cause business is there, ie: Azure, AWS, etc. He has a concern about data confidentiality and he doesn't want the company's financial info to leak out. The software vendor also suggested we build our own data center if we are so concern about data confidentiality. This makes me even more difficult to convince him. He has the impression that anything can be done in China.
I understand that Azure SQL is not an option for me cause host admin still have control even though I implement TDE (cannot use Always Encrypt). Now I'm looking at VM where I have full control over - at VM level up. I can also use disk encryption. Couple that with other security measures like SSL I'm hoping that this will improve the security of the data is it in transit or at rest. Is my understanding correct?
With that said, can the Azure admin still overwrite anything set on VM and take over the VM fully?
Even though it's technically possible but if this takes a lot of effort (benefit < effort) it still worth trying.
Any advice will be much appreciated.
Azure level Admin can just login to your VM, doesnt matter if its encrypted or not (or decrypt it, for that matter). You cannot really protect yourself from somebody inside your organization doing what he is not supposed to do (you can with to some extent with things like Privileged Identity Management, proper RBAC, etc).
If you are talking about Azure Fabric admin (so the person working for Microsoft or the chinese company in this particular case). He can, obviously pull the hard drive and get access to your data, but its encrypted at rest. Chances are he cannot decrypt it. If you encrypt the VM on top of that with Azure Disk Encryption (or Transparent Data Encryption) using your own set of keys he wouldn't be able to decrypt the data even if he can, somehow, get past the Azure side encryption
If you want to more control better to have IaaS services than PaaS services. You have more control on IaaS. You can use Bit loker to encrypt your disks if you are using Windows OS. China data center also under industry specific standards. Access to your customer data is controlled by an independent company in China, 21Vianet. Not even Microsoft can access your data without approval and oversight by 21Vianet. I think there is no big risk but you have to implement more security mechanism than Azure provide for better security.
In an Azure Web App I need to efficiently query the MaxMind GeoIP2 City Database (due to the volume of queries and the latency requirements we cannot use the MaxMind's rest API).
I'm wondering what's the best approach for storing the db (binary MMDB format, accessed via the official .NET api) so that it's easy to update with minimal downtime (we are going to subscribe Monthly updates) and still cost effective as to what regards Azure storage and transactions.
Apparently block blobs are the way to go, but I'm not sure about the monthly updates and the fact that the GeoIP2 api load in memory the whole db (I do not know if this would be a problem for the Web App, if I need a web worker to keep it up or I need something else), but actually I do not know yet how large the file is.
What's the most cost effective solution that preserve low latency over a huge volume?
According to the API docs you must have the database available in a file system (the API doesn't know anything about Azure storage and related REST API). So, regardless where you permanently store it, you'll need to have it on a disk somewhere.
I have no idea how large the database footprint is, but Web Apps, Cloud Services (web/worker roles) and Virtual Machines (whether Linux or Windows) all have local disks. And you have read/write access to these disks. So, you'd need to copy the database binary file (or csv) to local disk from somewhere. At this point, when you initialize the SDK, you'd create a DatabaseReader and point it to your locally-downloaded copy of the database file.
You mentioned storing the database in blob storage. There's nothing stopping you from doing so and simply downloading a copy to local disk. And there's nothing stopping you from storing multiple versions in multiple blobs. Note: You may also take advantage of Azure File storage (an SMB share). Which you choose is up to you.
As far as most cost effective solution: You'll need to do the pricing workup yourself to see what's most effective. You'd also need to evaluate how much RAM is available for the given size VM/role instance/Web App you choose. You mentioned Web Apps in your question: Web App instances scale from 0.5GB to 14GB, depending on the tier you choose (again, you'll need to evaluate this).
I'm using Azure Virtual Machines, specifically Linux. I went to add a blank disk ("attach...blank disk" in the portal) and discovered that Azure only allows a maximum size of 1023GB for disks. The portal won't allow you to specify a size beyond 1023GB.
What I'm looking for is a 4TB filesystem. The disks present themselves as /dev/xd?. I'm wondering if I could take four 1TB disks and stripe them (RAID 0) in the OS? If they're SAN disks then I'm not concerned about the redundancy since presumably they're already protected. I admit it sounds kind of hokey.
Is there another option to get bigger disks in Azure?
To be clear, I want persistent storage, not the ephemeral /mnt/storage.
You are correct. You need 4 disks in Raid0 to get 4TB of data. You can follow the guide below; just make sure to change parameters accordingly because the guide uses 3 disks only.
Configure Software RAID on Linux
https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-linux-configure-raid/
Regarding redundancy, no matter what kind of storage you configured in Azure, the worst you can get is 3 mirrors for each disk so just go for full performance.
Azure Storage Replication
https://azure.microsoft.com/en-us/documentation/articles/storage-redundancy/
For Windows you may use Storage Spaces
http://blogs.msdn.com/b/dfurman/archive/2014/04/27/using-storage-spaces-on-an-azure-vm-cluster-for-sql-server-storage.aspx
https://technet.microsoft.com/en-us/library/hh831739.aspx
We plan to migrate the existing website to Windows azure, and i have been told that we need to store files to blob storage.
My questions is:
If we want to use blob storage, that means i need to re-write the file storage function(we use file system for now), call blob service api to store files, that's very strange for me just because we want to use windows azure, how about in the future we want to use Amazon EC2 or other cloud platform, they might have there own way to store file, then may be i need to re-write the file storage function again, in my opinion , the implementation of a project should not depends on the cloud platform(or cloud server)! Can any body correct me, thanks!
I won't address the commentary about whether an app should have a dependency on a particular cloud environment (or specific ways to deal with that particular issue), as that's subjective and it's a nice debate to have somewhere else. What I will address is the actual storage in Azure, as your info is a bit out-of-date.
One reason to use blob storage directly (and possibly the reason you were told to use blob storage) is that it provides access from multiple instances of your app. Also, blob storage provides 500TB of storage per storage account, and it's triple-replicated within the deployed region (and optionally geo-replicated). With attached storage (either with local disk or blob-backed Azure Disk), the access is specific to a particular instance of your app. Shifting from file system access to blob storage access does require app modification.
If you choose not to modify your app's file I/O operations, then you can also consider the new Azure File Service, which provides SMB access to storage (backed by blob storage). Using File Service, your app would (hopefully) not need to be modified, although you might need to change your root path.
More information on Azure File Service may be found here.
Why does it seem strange? You need to store your files somewhere and the cloud is a good a place as any IF it suits your needs. The obvious advantages are redundancy and geo replication, sharing files across multiple projects and servers, The list goes on. It's difficult to advise on whether it would be a good idea or not without hearing some specifics.
You could use windows azure storage with amazon in the future if you wanted to (you'd just need to set up the access for it), obviously with slighter longer delay. Then again that slight performance drop may be significant and you may end up re-writing it.
Most importantly, swapping over from one cloud provider to another is not trivial depending on just how much you use it or how much data you've got in it, so I would strongly suggest looking at the advantages / disadvantages of each platform closely before putting your lot in with either one and then fully learn that platform.
Personally, I went for Azure cloud services + storage etc even though it was slightly more expensive at the time, because i'm a Microsoft Person (not that I didn't do my research). It was annoying in the early days when key features were missing, but it's really matured now and I like the pace that it's improving.
It's cheap to test, why not try both and see which one suits you? A small price to pay when you have big decisions to make.
Disclaimer: I don't know the current state of Amazon web services.
Nice question. We are in the middle of a migration of an old PHP/MySQL/LocalShare to WebRole/SQLAzure/AzureStorage ERP application. We faced the same problem and decision. Let me write some thoughts about the issue :
It is a good option to just be able to switch the storage provider but is it reasonable? You can always build the abstraction but do you plan how to do the actual change of storage provider - migration/sync while in production? What kind of argument will exactly drive the transition to another storage provider? How much users and data do you have? Do you plan to shard-rebalance the storage in the future? How reliable must be this system during this storage provider switch? Do you want to totally move the data when you want to switch or you just want to shard it so that you start using this different provider? Does the cost development of these (reliable) storage layers and the cost of development of reliable transitions (or bi-directional syncs) outweighs the money difference between any two storage providers?
Just switching storage mechanism from Azure Blob to Amazon will incur heavy latency penalty if your other services are on Azure - When you create Storage and Services on Azure you set affinity groups by region so that you minimize the network latency.
These are only a few of the questions to answer before doing all the weightlifting. We have abstracted the file repository (blob) because we planned to move from local NFS to Blob transparently and gradually and it answers our needs.
I have deployed a Worker Role to an Azure instance with remote access enabled.
When I remote to the server, I see disks C: and D: on the server.
I was told that Azure doesn't guarantee the durability of data stored in compute instance. However when I reboot/upgrade the service, I still see the previous data on disks C: and D:.
When will the data on disks C: and D: be lost?
Local disks are non-durable disks. In other words, not replicated. They may fail at any time and offer you no way to recover this data.
During role recycles (reboots), data typically will survive, but you cannot count on it surviving.
If your software must use a drive letter because you can't alter the code base, you can mount an NTFS volume inside a Page Blob (basically a Cloud Drive). You can do this from your OnStart(), then pass the drive letter to your app. Note: a cloud drive may only have one writer. So... if you have multiple instances, each instance would need to create its own cloud drive.
Because Azure is a cloud service, the hardware that your instance is running on is not guaranteed to be the same at any given point in time. As a result, you shouldn't rely on the data being present. Even though it may persist across reboots/upgrades, it isn't guaranteed.
See the second paragraph on Local Storage from this article. It makes the following recommendation:
If you require reliable durability of your data, want to share data between instances, or access your data outside of Windows Azure, consider using a Windows Azure Storage account or SQL Azure Database instead
It will usually be there after a reboot, but I have seen one case where I rebooted and something went wrong, so the instance was reset to a clean state. You cannot rely on the data surviving. I would imagine the same thing could happen with an upgrade.
Stopping and starting the instances will also probably lose the data, but I haven't checked.
Here's a quote from an MVP on the MSDN forums:
The local disk storage of Compute VMs (whether Web Role, Worker Role, or VM Role) is not persistent. It can go away at any time. The data center has the right to move and re-create your VMs whenever it deems it necessary. This could happen in response to a hardware failure, or simply because the data center needs to be reorganized. When this happens, you lose your VM disk files and go back to your deployment image. It is only a matter of time before this happens. This is normal behavior for cloud computing compute instances.