Estimating Azure Search Cost - azure

I'm fairly new to Azure platform and need some help with cost estimates for Azure Search service.
Every month we will have about 500GB worth of files that will dropped into Azure Blob Storage. We would like to index these files using Azure Search just based on the file names.
When I look at the Standard S2 pricing, it has the following text for storage:
100 GB/partition (max 1.2 TB documents per service). What does this mean? Does it mean that once my storage crosses 1.2TB, I'll need to purchase another Service?
Any help would be greatly appreciated.

If a tier's capacity turns out to be too low based on your needs, you will need to provision a new service at the higher tier and then reload your indexes. Kindly note that there is no in-place upgrade of the same service from one SKU to another.
Storage is constrained by disk space or by a hard limit on the maximum number of indexes, document, or other high-level resources, whichever comes first.
A service is provisioned at a specific tier. Jumping tiers to gain capacity involves provisioning a new service (there is no in-place upgrade). For more information, see Choose a SKU or tier. To learn more about adjusting capacity within a service you've already provisioned, see Scale resource levels for query and indexing workloads.
Check the document for more details on this topic.
S2 offers the following (at this time) - Storage per partition = 100 GB, with Partitions per service = 12 and, the Partition size = 100 GB.
You could use Pricing Calculator (https://azure.microsoft.com/pricing/calculator/) for the cost estimation as well.

The storage limits for Azure Search refers to the size of documents in the index, which can be bigger or smaller than your original blobs depending on your use case.
For example, if you want to do text searches on blob content then the index size will be bigger than your original blobs. Whereas if you only want to search for file names, then size of the original blobs becomes irrelevant and only the number of blobs will affect your Azure Search index size.

Related

Charged for Publishing Azure Functions from VisualStudio?

Do you get billed in Azure for each time you publish an Azure Functions from VS?
The short answer is Yes, you get charged for publishing an Azure Function from Visual Studio. But "each time?" well, not really.
So, let's get to understand how that works. Azure functions although they offer you 1,000,000 executions per month (considering the execution time and memory), your code has to live somewhere which is the Function Storage Account.
Storage Accounts pricing could be broken down into two main costs:
Storage:
You pay for storage per month (pay-as-you-go) unless you are on a Premium storage plan. The first 50 TB of data in a Hot access tier of blob data is ~$0.0184 to $0.0424 per GB depends on where your data is hosted and its redundancy.
Now in your case that cost will incure once per 50 TB per month
Transfer:
When you deploy your data via Visual Studio you're effectively making API calls to Write your data which is charged (also depends on your data host location and redundancy) per 10,000 operations that includes every time you or your function does PutBlob, PutBlock, PutBlockList, AppendBlock, SnapshotBlob, CopyBlob, and SetBlobTier on that Storage Account. The operations cost varies from $0.05 to $0.091 for every 10,000 operations.
Others:
Other costs may incure using features such as Blob Index, Blob Changes, and encryption.
Conclusion
Publishing your Function from Visual Studio contributes to the overall cost of the Functions's Storage Accout. However, the cost is very small (sub $1) even if you published your function thousands of times every month.
For more information about Azure Blob Storage pricing visit https://azure.microsoft.com/en-us/pricing/details/storage/blobs/#pricing

What is the maximum capacity of Azure Blob Storage account?

If I am creating an Azure Storage Account v2 then what is the maximum capacity of (or maximum size) of files we can store in the blob storage? I see some docs talking about 500 TB as the limit. Does that mean once the storage account reaches that 500 TB limit then it will stop accepting the uploads? Or is there a way to store more files by paying more?
It depends on the region. According to https://learn.microsoft.com/en-us/azure/azure-subscription-service-limits#storage-limits US and Europe can have up to 2PB Storage accounts. All other regions are 500TB. As mentioned by Alfred below, you can request an increase if you need to (see new max sizes here https://azure.microsoft.com/en-us/blog/announcing-larger-higher-scale-storage-accounts/)
I have yet to see a storage account hit the limit, but I would anticipate you would hit an error trying to upload a file at max capacity. I would advise designing your application to make use of multiple storage accounts to avoid hitting this limit (if you are expecting to exceed 500TB).
you can ask support to increase
https://azure.microsoft.com/en-us/blog/announcing-larger-higher-scale-storage-accounts/

What are the limitations for zip file extraction in Azure Search?

My indexer shows an error for two zip files (size of 800MB) saying that the zip file has "the size of ... bytes, which exceeds the maximum size for document extraction for your curent service tier." Azure Search is already set on a Standard tier. Is the solution to go to a higher tier? Cause from the documentation I gather that the limit is the same across all Standard tiers? If there is a limit to the size of zip files to be extracted, then what is it? And would the solution for bigger files then be to unzip it before Azure Search?
Maximum running times exist to provide balance and stability to the service as a whole, but larger data sets might need more indexing time than the maximum allows. If an indexing job cannot complete within the maximum time allowed, try running it on a schedule. Here is indexer limit you could refer to.
Note: A service is provisioned at a specific tier. Jumping tiers to gain capacity involves provisioning a new service (there is no in-place upgrade). For more information, see Choose a SKU or tier. To learn more about adjusting capacity within a service you've already provisioned, see Scale resource levels for query and indexing workloads.

What Azure Table IOPS limitations exist with the different Web App (or function) sizes?

I'm looking at high scale Azure table operations, and I am looking for documentation that describes the maxIOPS to expect from Azure instance sizes for Azure Web App, Function, etc.
The Web Roles and corresponding limitation is well documented. For example see this comment in the linked question
so, we ran our tests on different instance sizes and yes that makes a huge difference. at medium we get around 1200 writes per second, on extra large we get around 7200. We are looking at building a distributed read/write controller possibly using the dcache as the middle man. – JTtheGeek Aug 9 '13 at 22:39
Question
What is the corresponding limitation for the Web Apps (logic, mobile, etc) and Azure Table IOPS
According to the official document that total Request Rate (assuming 1 KB object size) per storage account Up to 20,000 IOPS, entities per second, or messages per second. We also can get the VM Max IOPS limitations from the Azure VM size document. Web Apps are based on service plan, in the service plan we could choose different price tiers that have different VM sizes. It maybe could use for reference. More Azure limitation please refer to Azure subscription and service limits, quotas, and constraints.

Azure: Why is it advised to use multiple storage accounts for a virtual machine scale set?

In the documentation for Virtual Machine Scale Sets it says
Spread out the first letters of storage account names as much as possible
I have two questions to this:
Why should you use multiple Storage Accounts at all?
Why is Azure creating 5 Storage Accounts if I create a new Virtual Machine Scale Set through portal?
Why should I spread the first letters as much as possible?
The answer to this lies in the limits of Azure. If you look at the storage limits specifically, you will find that the storage account is capped at 20k IOPS.
Total Request Rate (assuming 1KB object size) per storage account
Up to 20,000 IOPS, entities per second, or messages per second
So that means that your Scale Set would effectively be capped at 20k IOPS, no matter how many VM's you put in it.
As for the storage Account naming, I have no clue, but looking at the templates they are linking to, they are not doing it:
"uniqueStringArray": [
"[concat(uniqueString(concat(resourceGroup().id, variables('newStorageAccountSuffix'), '0')))]",
"[concat(uniqueString(concat(resourceGroup().id, variables('newStorageAccountSuffix'), '1')))]",
"[concat(uniqueString(concat(resourceGroup().id, variables('newStorageAccountSuffix'), '2')))]",
"[concat(uniqueString(concat(resourceGroup().id, variables('newStorageAccountSuffix'), '3')))]",
"[concat(uniqueString(concat(resourceGroup().id, variables('newStorageAccountSuffix'), '4')))]"
],
I suspect, this may be somehow linked to how the storage accounts are distributed among nodes hosting them (so say accounts starting with 'A' are all hosted on the same cluster or near by clusters).
It's about avoiding throttling
https://learn.microsoft.com/en-us/azure/storage/storage-scalability-targets
For standard storage accounts: A standard storage account has a
maximum total request rate of 20,000 IOPS. The total IOPS across all
of your virtual machine disks in a standard storage account should not
exceed this limit.
You can roughly calculate the number of highly utilized disks
supported by a single standard storage account based on the request
rate limit. For example, for a Basic Tier VM, the maximum number of
highly utilized disks is about 66 (20,000/300 IOPS per disk), and for
a Standard Tier VM, it is about 40 (20,000/500 IOPS per disk), as
shown in the table below.
There's is no price difference between 5 or 1 storage accounts, so why not 5?
If you create 5 SA in different Storage Rack/Stomp (Datacenter infrastructure) you have less chance to be throttled, and they have better chance to distribute traffic load. So I think those are the reasons

Resources