I have a system set up in Microsoft Azure where an Azure VM connects to Azure Blob store and downloads a file for processing. A new output file is generated and the uploaded back into the Azure Blob Store. The output file is several orders of magnitude larger than the input file.
The Azure VM accesses the blob storage through an endpoint like:- "https://xxxxxx.blob.core.windows.net/". Where xxxxxx is the blob store name, redacted for privacy.
My question is, when I upload the output file into the Azure Blob store through that endpoint, does the traffic from the VM count as egress to the internet? I.e. is it chargable? I have trawled through the documentation on the Microsoft Website and even spoken directly with a Microsoft Sales representative and I get conflicting information.
For example you can see this on the MS Website:- Azure Screenshot. But the MS representative was adamant that it would be charged. Obviously this has huge implications on cost for us. In fact, as ingress traffic is free, it may even prove cheaper to host the application outside the Azure cloud!
So, can someone set me straight, will this bandwidth be chargeable? If so, is there a way to avoid this charge? Through some special VNet peering or something?
Thanks Stack Overflow Community!
So, after much experimentation, I have concluded that all traffic between Microsoft Services and your VM's is free. This is true, even if you connect to them from an external IP address provided that you connect to them from the same data centre (i.e. EU North). This was tested with over 6TB of upload from an Azure VM to Azure BLOB store without any cost incurred.
There is rumour that this might change when Microsoft Azure starts to charge for bandwidth between Availability Zones in early 2021. So, if you're relying on this information in the future I advise you to double check and experiment before you commit to nay huge data transfers.
I would say it depends of the region, if both VM and blob are really in the same region, and specially in the same vnet and availability zone it shouldn't be charged.
My recommendation is to test and if it happens to be charged you could open a support request to get the details, they will explain to you why it was charged and if there is a workaround.
There is something wrong that you understand about the outbound data transfers. It means going out of the Azure data center. This is the basic limitation. All the Azure resources are located in the Azure data center. In the same region means in the same Azure data center and the data transfer is in the internal network of the data center without going through the Internet. So it is not charged. On the other hand, the different regions mean different data centers, then the data transfer will go through the Internet, and then it's charged.
To avoid the charge for the request from the VM to the Azure Blob Store. The first thing is to put both the VM and the Azure Blob Store in the same region. And you can use the private endpoint of the Azure Blob Store. In this way, the request will be in the same VNet and do not go through the Internet so that it's not chargeable. Here are the steps to achieve it.
Related
I think we have gone slightly wrong on the way we have used Azure storage in a SAAS system. We created a storage account per client (Securtiy was prime consideration) and containers per system area e.g. Vehicle, Work etc
Having done further reading it seems a suggestion would be that we should have used one account for all clients. Each client should have a container (so we can programmatically create it) which we then secure. Then files should just be structured using "virtual" folder structure e.g. Container called "Client A". Then Files for the Jobs (in Work area of system) stored like Work/Jobs/{entity id}/blah.pdf. Does this sound sensible?
If so we now have about 10 accounts that we need to restructure. Are there any tools that will let us easily copy one accounts contents to another containers account? I appreciate we probably can't move the files between accounts (as we set them up ages ago so can't use native copy function) so I guess some sort of copy. There are GB of files across all the accounts.
It may not be such a bad idea to keep different storage accounts per client. The benefits of doing that (to me) are:
Better security as mentioned by you.
You'll be able to achieve better throughput / client as each client will have their own storage account. If you keep one storage account for all clients, and if one client starts hitting that account badly other clients will be impacted.
Better scalability. Each storage account can hold up to 200 TB of data. So if you keep just one storage account and assuming each client consumes 100 GB of data, you'll be able to accommodate only 2000 clients (I hope my math is right :)). With individual storage accounts, you won't be restricted in that sense.
There're some downsides as well. Some of them are:
Management would be a nightmare. Imagining you have 2000 customers then you would end up managing 2000 storage accounts.
You may be limited by Windows Azure. Currently by default you get about 10 or 20 storage accounts per subscription and you would need to contact support to manually up that limit. They can do that for you but I would imagine you would want this to be a self-service model where you would be able to create as many storage accounts as you want without contacting support.
Now coming to your question about tooling, you could possibly write something on your own which makes use of Copy Blob functionality. This functionality allows you to copy blob data across storage accounts asynchronously. Basically this is what you would do is:
First create a blob container for each client in the target storage account.
Enumerate all blob containers in source storage account.
For each blob container in source storage account, enumerate the blobs.
Copy each blob asynchronously to target storage account in the client's blob container.
If you're a PowerShell fan, you can look into Cerebrata's Azure Management Cmdlets (http://www.cerebrata.com/Products/AzureManagementCmdlets) as well which wraps this functionality. I could have recommended Cerebrata's Azure Management Studio as well but I haven't tried this functionality just yet there [Disclosure: I'm one of the devs on Cerebrata team].
Hope this helps.
Adding to Gaurav Mantri answer...
You can have shared storage account for customers and use Shared Access Signature(SAS) to limiting access to particular container or blobs(as well as for tables and queues)...
http://msdn.microsoft.com/en-us/library/windowsazure/hh508996.aspx
I have a web role that talks to Azure Storage, Azure Shared Cache Service and Azure SQL Databases. It is only ever the web roles that communicate with these storage mediums, and never the client browser. The Azure Table Storage contains sensitive data, but the cache and SQL databases do not.
Question is, if all data access goes over plain HTTP, is there a risk that someone can intercept my packets, and read my storage key? If so, who can sniff these packets - just Microsoft employees, or do I need to worry about other Azure tenants that might have effected a jailbreak?
A few things to consider:
If your webrole and storage accounts are in the same data center, then the traffic is contained within data center. In that case, going of HTTP would not create any problems IMO. However if the webrole and storage accounts are in different data centers, then definitely make use of HTTPS.
Since you never send your storage account key with your requests to storage, you can be assured on that part. What you do is sign the requests using your key (or the storage client library does) and send that signature as a part of your requests. I don't think one would be able to reverse engineer that signature to get your storage account key.
HTH.
In addition to the previous answers, you should also take a look at the official security whitepaper: Windows Azure Security Overview. It talks about how isolation and packet filter secure the communication in the datacenter.
Hy ppl , I dont understand how azure storage is charged for around 34gb in my subscription. We havent used that much storage space.
I heard there is a quest tool for azure storage explorer.How useful is that ?
Many Thanks.
Are you using Virtual Machines? If that's the case, you have to know that persisted disks are stored as page blobs in your storage account, and you're charged for that. The pricing details page explains why:
Compute hours do not include any Windows Azure Storage costs
associated with the image running in Windows Azure Virtual Machines.
These costs are billed separately. For a full description of how
compute hours are calculated, please refer to the Cloud Services
section.
If you want to know more details on how much data you've used per storage account/day/location/... I suggest you take a look on the subscriptions page. After choosing a subscription you can export a detailed CSV file you can analyse.
I know Azure will geo-replication a copy of current storage account to another location,
my questions is: can I access another location in program, even just read only
I asked this, because this allow me to build another deploy in different geo-location for performance and disaster-proof like what Azure did. For current setup, if I use same source of storage in different geo-location, I have to pay extra bandwidth cost.
You can only access your storage account by its primary name. In the event of failover, that name will be mapped to the alternate datacenter. You cannot access the failover storage directly, nor can you choose when to trigger a failover. For a multi-site setup as you described, you'd need to duplicate your data (which would then add the cost of storage in datacenter #2). This does give you ultimate flexibility in your DR and performance planning, but at an added cost of storage and bandwidth (egress-only).
Last week the storage team announced read-only access to the failover storage: Windows Azure Storage Redundancy Options and Read Access Geo Redundant Storage.
This means you can now deploy your application in a different datacenter which can be used for "full" failover (meaning that the storage will also be available there). Even if it's only read-only, your application will still be online - but simply in "degraded" mode.
The steps on how you can implement this with traffic manager are described here: http://fabriccontroller.net/blog/posts/adding-failover-to-your-application-with-read-access-geo-redundant-storage-and-the-windows-azure-traffic-manager/
I'm trying to get up-and-going with Windows Azure. I understand that I need to create a "Storage Account". However, what I'm confused about is, how I should set it up. For instance, my Azure subscription is set to my company name. I intend to have multiple ASP.NET web applications (web roles) associated with my subscription. Each web application will have its own database.
My question is, should each web application have its own storage account? Or should only one storage account be used for all of my projects?
Thank you!
There's no one way to answer this, but here are some thoughts to help your decision:
Each storage account is limited to 100TB. If you feel that you will push the limits of this across multiple websites, then create multiple storage accounts for sure.
To make billing easier, I'd suggest separate storage accounts
Storage accounts have an SLA of a few thousand transactions per second across the entire storage account. For performance purposes, it's probably better to have separate storage accounts
Consider putting your diagnostic data in a separate storage account. This way, you can safely give your Storage Account key to a 3rd-party like ParaLeap (creators of AzureWatch) for monitoring your app, while not giving away the key to real customer data, for instance.
If you need more than 5 storage accounts, you'll need to contact Customer Support to increase this number.
Windows Azure Storage server is for simple blob storage. This is for when your app needs a file store. Any application, not just Azure web roles, can target a storage service. It's kind of like Amazon S3 if you're familiar with that.
Storage services are not required to run Azure applications. You just need a "compute" instance.