Optimizing downloads from Azure Blob Storage with 1000 concurrent users [closed] - azure

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I have a question regarding the optimization of Azure blob download speeds. I am looking at having a private container in Azure blob storage with 10000 files of size ~5 MB. Whenever an user wants to download this file, I will be generating a SAS Url for the user to download the file. As of now, I am looking at ~1000 concurrent users downloading various files at any point of time.
I would like to know whether any of the below steps will help me to maintain optimal download speeds for this kind of usage.
Will storing the files across different containers help in improving download speeds.
Read in the Windows Azure storage team's blog that each storage account has a fixed bandwidth. To offset this , do I need to storing the files across different storage accounts.
Is it sufficient to have a single container in a storage account to get the best download speeds for ~1000 concurrent users.
I
t will also be great if you can let me know the best practices to achieve this.

No. Blobs are the partitions, not the containers.
Yes.
It depends. The target throughput of a single blob is up to 60 MBytes/sec, but since you're talking about 10000 files this shouldn't be a problem (assuming your 1000 concurrent users will download different files). What you'll need to look at is the scalability target of the storage account, where the throughput is up to 3 gigabits per second. This could become an issue if your application grows, but there are a few solutions you can look at:
Use multiple storage accounts (maybe one per country, per application, ...). The limit for creating storage accounts is pretty low (it used to be 5 storage accounts per subscription, don't know if this changed), so you'll need to contact Microsoft is you want to use more storage accounts.
Think about using the CDN together with blob storage to expose your files. This will improve performance (more throughput) but your users will also download the files much faster since they download from a 'nearby' location.
You can also do some caching in your Web Roles (in LocalResources for example, or the Caching Preview, to cache your popular files). But I wouldn't advise on doing this.
This article is a good place to start: Windows Azure Storage Abstractions and their Scalability Targets

Related

Dedicated or shared Storage Account for Azure Function Apps with the names less than 32 symbols

Short Version
We want to migrate to v4 and our app names are less than 32 symbols.
Should we migrate to dedicated Storage Accounts or not?
Long Version
We use Azure Functions v3. From start one Storage Account was shared between 10+ Azure Function Apps. It could be by luck but the names are less than 32 symbols and it is not going to change. We are not using slots as they were initially not recommended and then with no adoption time or recommendation made generally available.
Pre-question research revealed this question but it looks like more related to the durable functions. Another question looks more up the point but outdated and the accepted answer states that one Storage Account can be used.
Firstly, the official documentation has a page with storage considerations and it states (props to ijabit for pointing to it.):
It's possible for multiple function apps to share the same storage account without any issues. For example, in Visual Studio you can develop multiple apps using the Azure Storage Emulator. In this case, the emulator acts like a single storage account. The same storage account used by your function app can also be used to store your application data. However, this approach isn't always a good idea in a production environment.
Unfortunately it does not elaborate further on the rationale behind the last sentence.
The page with best practices for Azure Function mentions:
To improve performance in production, use a separate storage account for each function app. This is especially true with Durable Functions and Event Hub triggered functions.
To my greater confusion there was a subsection on this page that said "Avoid sharing storage accounts". But it was later removed.
This issue is somehow superficially related to the question as it mentions the recommendation in the thread.
Secondly, we had contacted Azure Support for different not-related to this question issues and the two different support engineers shared different opinions on the current issue. One said that we can share a Storage Account among Functions Apps and another one said that we should not. So the recommendation from the support was mixed.
Thirdly, we want to migrate to v4 and in the migration notes it is stated:
Function apps that share storage accounts will fail to start if their computed hostnames are the same. Use a separate storage account for each function app. (#2049)
Digging deeper into the topic, the only issue is the collision of the function host names that are used to obtain the lock that was known even in Oct 2017. One can follow the thread and see how in Jan 2020 the recommendation was made to update the official Azure naming recommendation but it was made only on late Nov 2021. I also see that a non-intrusive, i.e. without renaming, solution is to manually set the host id. The two arguments raised by balag0 are: single point of failure and better isolation. They sound good from the perspective of cleaner architecture but pragmatically I personally find Storage Accounts reliable, especially if read about redundancy or consider that MS is dog-fooding it for other services. So it looks more like a backbone of Azure for me.
Finally, as we want to migrate to v4, should we migrate to dedicated Storage Accounts or not?
For the large project with 30+ Azure Functions I work on, we have gone with dedicated Storage Accounts. The reason why is Azure Storage account service limits. As the docs mention, this really comes into play with Durable Task Functions, but can also come into play in other high volume scenarios. There's a hard limit of 20k requests per second for a Storage Account. Hit that limit, and requests will fail and will return HTTP 429 responses. This means that your Azure Function invocation will fail too. We're running some high-volume scenarios and ran into this.
It can also cause problems with Durable Task Functions if two functions have the same TaskHub ID in host.json. This causes a collision when Durable Task Framework does its internal bookkeeping using Storage Queues and Table Storage, and there's lots of pain and agony as things fail in spectacular fashion.
Note that the 20k requests per second service limit can be raised with a support ticket to Azure. If approved, the max they'll raise it to is 50k requests/second.
So avoid the potential headaches and go with a Storage Account per Function.

Azure storage service architecture

I am using Azure storage service for my application
I need to store some organisation data like images, documents, videos etc for my application.
In my application user from 50 organisations upload their data.
We have following concerns
1) each company will use only 10 GB space. If their user-data exceeds 10 GB then there will be no access for storage.
Is it feasible?
2) What is best architecture we can design? For example I have container for each 5 organisations and year folder like 2018/2017 and then sub-folders inside each years like image/doc/videos
So I will have 5 container then years folder with 3 sub-folders each.
Heirarchy will like
organisation (container) - > year (folder) - > three sub-folders (image/doc/videos)
Then is it possible to restrict/grant access to years (folder)?
Please suggest?
I have abstracted three questions from your revised question (I think)
Can we limit a storage account container to a specific size (like 10 GB)? (Or is there an approach we can implement to achieve this)
No there is no ability to set a quota on a given container in a storage account. To do that you will need to implement the size check business logic in an API that is fronting the storage account and containers.
What architecture is suggested given the data provided? (organization, year, type, etc.)
The container structure suggested will work as long as you implement an API above it for controlling access and to enforce business rules and storage placement.
Can you restrict/grant access to containers in the Azure storage account (aka folders)?
You have to use SAS tokens if you want to secure containers in an Azure storage account. Broadly speaking with your requirements you need to look at implementing a middle-man service like an API (through Functions / API Management / Logic Apps) that will implement your storage routing, business logic and security rules.

How to store (and query) the MaxMind GeoIP2 database in Azure?

In an Azure Web App I need to efficiently query the MaxMind GeoIP2 City Database (due to the volume of queries and the latency requirements we cannot use the MaxMind's rest API).
I'm wondering what's the best approach for storing the db (binary MMDB format, accessed via the official .NET api) so that it's easy to update with minimal downtime (we are going to subscribe Monthly updates) and still cost effective as to what regards Azure storage and transactions.
Apparently block blobs are the way to go, but I'm not sure about the monthly updates and the fact that the GeoIP2 api load in memory the whole db (I do not know if this would be a problem for the Web App, if I need a web worker to keep it up or I need something else), but actually I do not know yet how large the file is.
What's the most cost effective solution that preserve low latency over a huge volume?
According to the API docs you must have the database available in a file system (the API doesn't know anything about Azure storage and related REST API). So, regardless where you permanently store it, you'll need to have it on a disk somewhere.
I have no idea how large the database footprint is, but Web Apps, Cloud Services (web/worker roles) and Virtual Machines (whether Linux or Windows) all have local disks. And you have read/write access to these disks. So, you'd need to copy the database binary file (or csv) to local disk from somewhere. At this point, when you initialize the SDK, you'd create a DatabaseReader and point it to your locally-downloaded copy of the database file.
You mentioned storing the database in blob storage. There's nothing stopping you from doing so and simply downloading a copy to local disk. And there's nothing stopping you from storing multiple versions in multiple blobs. Note: You may also take advantage of Azure File storage (an SMB share). Which you choose is up to you.
As far as most cost effective solution: You'll need to do the pricing workup yourself to see what's most effective. You'd also need to evaluate how much RAM is available for the given size VM/role instance/Web App you choose. You mentioned Web Apps in your question: Web App instances scale from 0.5GB to 14GB, depending on the tier you choose (again, you'll need to evaluate this).

About windows azure blob storage, the implementation of a project should not depends on the cloud platform

We plan to migrate the existing website to Windows azure, and i have been told that we need to store files to blob storage.
My questions is:
If we want to use blob storage, that means i need to re-write the file storage function(we use file system for now), call blob service api to store files, that's very strange for me just because we want to use windows azure, how about in the future we want to use Amazon EC2 or other cloud platform, they might have there own way to store file, then may be i need to re-write the file storage function again, in my opinion , the implementation of a project should not depends on the cloud platform(or cloud server)! Can any body correct me, thanks!
I won't address the commentary about whether an app should have a dependency on a particular cloud environment (or specific ways to deal with that particular issue), as that's subjective and it's a nice debate to have somewhere else. What I will address is the actual storage in Azure, as your info is a bit out-of-date.
One reason to use blob storage directly (and possibly the reason you were told to use blob storage) is that it provides access from multiple instances of your app. Also, blob storage provides 500TB of storage per storage account, and it's triple-replicated within the deployed region (and optionally geo-replicated). With attached storage (either with local disk or blob-backed Azure Disk), the access is specific to a particular instance of your app. Shifting from file system access to blob storage access does require app modification.
If you choose not to modify your app's file I/O operations, then you can also consider the new Azure File Service, which provides SMB access to storage (backed by blob storage). Using File Service, your app would (hopefully) not need to be modified, although you might need to change your root path.
More information on Azure File Service may be found here.
Why does it seem strange? You need to store your files somewhere and the cloud is a good a place as any IF it suits your needs. The obvious advantages are redundancy and geo replication, sharing files across multiple projects and servers, The list goes on. It's difficult to advise on whether it would be a good idea or not without hearing some specifics.
You could use windows azure storage with amazon in the future if you wanted to (you'd just need to set up the access for it), obviously with slighter longer delay. Then again that slight performance drop may be significant and you may end up re-writing it.
Most importantly, swapping over from one cloud provider to another is not trivial depending on just how much you use it or how much data you've got in it, so I would strongly suggest looking at the advantages / disadvantages of each platform closely before putting your lot in with either one and then fully learn that platform.
Personally, I went for Azure cloud services + storage etc even though it was slightly more expensive at the time, because i'm a Microsoft Person (not that I didn't do my research). It was annoying in the early days when key features were missing, but it's really matured now and I like the pace that it's improving.
It's cheap to test, why not try both and see which one suits you? A small price to pay when you have big decisions to make.
Disclaimer: I don't know the current state of Amazon web services.
Nice question. We are in the middle of a migration of an old PHP/MySQL/LocalShare to WebRole/SQLAzure/AzureStorage ERP application. We faced the same problem and decision. Let me write some thoughts about the issue :
It is a good option to just be able to switch the storage provider but is it reasonable? You can always build the abstraction but do you plan how to do the actual change of storage provider - migration/sync while in production? What kind of argument will exactly drive the transition to another storage provider? How much users and data do you have? Do you plan to shard-rebalance the storage in the future? How reliable must be this system during this storage provider switch? Do you want to totally move the data when you want to switch or you just want to shard it so that you start using this different provider? Does the cost development of these (reliable) storage layers and the cost of development of reliable transitions (or bi-directional syncs) outweighs the money difference between any two storage providers?
Just switching storage mechanism from Azure Blob to Amazon will incur heavy latency penalty if your other services are on Azure - When you create Storage and Services on Azure you set affinity groups by region so that you minimize the network latency.
These are only a few of the questions to answer before doing all the weightlifting. We have abstracted the file repository (blob) because we planned to move from local NFS to Blob transparently and gradually and it answers our needs.

Share data between users in metro application

I would like to create a Metro application that allows a group of people to interact. One person would create data and serve as the owner, and multiple others would be invited in and be allow to modify that data. I heard from Build talks that each Metro application will get per-user Azure storage, but will it be possible to share that data between multiple users? Does anyone have a link they could share where I could research this?
I think that you are confusing SkyDrive with Azure Blob Storage.
SkyDrive
Personal to a Live ID
Not really meant as a base for collaborative work
Azure Blob Storage
You can have public files that anyone can view and update
You can have a lease on file that only allows certain people to edit it
Since you own the Azure account you also control the content
You can learn the basics here
If you want to share private app data between users, the best way to do so would be via a shared server of some sort. You should have a server (running on Azure, Amazon EC2, or anything really) that exposes a REST-ful web service which each application connects to. The shared state then lives on that server.
This is better than trying to use skydrive or some file-based system for storing shared data. With a file on skydrive and multiple users trying to access it, you would run into concurrency issues when more than 1 person tries to write to it.
You don't get Azure with Metro.
With Live you get a free SkyDrive that is a personal cloud storage. Like 10 GB. Can share files but it is via sending an email link. It is not file storage that would readily support a server type application to manage that sharing.
Azure is a cloud platform for file and data sharing. Azure is not free but storage cost is only $0.125 / GB per month. 10 GB = $1.25 / month. Using SkyDrive as shared storage you are giving up a lot of developer and hosting tools that come with Azure to save $1.25 / month.
It looks like there is a more formal definition of this with the updated help now available. They were referring to roaming application data. I found the following links that provide guidance:
http://msdn.microsoft.com/en-us/library/windows/apps/hh464917.aspx
http://msdn.microsoft.com/en-us/library/windows/apps/hh465094.aspx
The general is that a small amount of temporary application data is provided on a per-app, per-user basis. The actual size you get is not detailed, but the guidance is pretty clear - app settings only, no large data sets, and don't use it for instant synchronization. Given this guidance, my plan is not a good one and will change.

Resources