According to the new scalability targets for Azure, each partition in table storage is limited to 2000 entities/second.
I have been able to reach batch inserts of up to 16000 entities/second through parallelism.
For example, on my XtraLarge web role (8 CPUs, 8 cores), I am inserting 6400 entities (which is 64 separate batch inserts of 100 each) over 64 simultaneous parallel tasks.
How is this possible? Is 2000 entities/second just the minimum performance expected from a partition?
They are scalability targets, not limits. As you point out, it is the minimum expected performance, not the maximum. I imagine that you finding that at that particular time, on that particular network and hardware, that there is little contention for resources from other Azure customers. Also, note the size of your entities — Azure seems to perform well with small entities, and the targets will be for the maximum (1MB). Be warned — although the unexpected performance may be a useful, don't plan on it being there. Also make sure that you are coding for failure (500s and 503s) regardless of how often you hit the limits, otherwise when the storage service is underperforming, you app may begin to fail.
Related
Context
Note: To be precise I have multiple data models on the same AAS
instance however from viewing the size of those models along with the usage graphs they don't seem to be impacting the memory usage by any significant amount. Therefore the discussion below focuses on the
"single data model" that to us seems to be most correlated to the observed
spikes.
I have a data model held in an Azure Analysis Services instance (the data model itself is a database inside the azure analysis services instance). The data model itself has been deployed using Visual Studio to the Azure Analysis Services instance. The data model is essentially created using data straight from SQL Server database (queries and stored procedures are being used to create the tables under the hood).
Note: Within this data model there are 16 tables in total. The largest 2 (as defined by % of model occupied & other metrics, which can be viewed via DAX Studio Vertipaq Analyzer) are the ones which have been partitioned day wise with 60 days partitions in total each (2022-04-11, 2022-04-12, ...) and handled via the partitioning automation procedure outlined in the Resources section below. The remaining 14 tables haven't been partitioned in that sense and are "fully processed" each time the refresh function triggers the refresh (effectively each of those 14 tables consist of 1 single large partition, i.e. the whole table).
E.g: Every hour, when our refresh functions triggers, the latest 3 partitions of our 2 large tables are re-processed and each of the remaining 14 tables are re-processed fully (since only 1 big partition which forms each of these tables).
The refreshes of the data model are performed using a function app which has functions which refresh the latest 3 daywise partitions of 2 of the largest tables in the data model while the other tables are processed whole every time during refresh.
Currently the function controlling the refresh execution is triggered to go off every hour, during which it performs a refresh of the data as described above.
The issue that we have been facing is that when we observe our memory usage dashboard (Check screenshot below) we tend to get massive spikes in the memory usage which seem to occur during this refresh phase.
In light of this observation we began trying to test out what seems to be causing these periodic spikes and observed the following interesting points:
Spikes align almost perfectly with the scheduled hourly refreshes of our data model.
Leading us to believe the spikes are related to the refresh process in some way.
In between these refreshes the memory usage drops significantly
Further making us believe the spikes is caused by some part of the refresh activity and not caused by general usage.
Increasing the number of partitions (time window of data for the 2 main tables) from 30 days to 60 days and vice versa causes significantly visible changes in the spikes
If we go up from 30 to 60 days the spikes amplitudes increase and the other way around causes it to decrease.
Performing "Defragmentation process" as outlined in the white paper viewable via the link in the Resources section temporarily reduces the usage by a little.
By nature of this process its something that would be need to be performed on a regular basis to ensure continued benefits.
The tables that are fully processed each time (all tables in main data model barring 2 which only refresh last 3 daily partitions) don't seem to cause a high impact on the memory usage spikes.
We manually processed some of the largest tables one after another in-between the refreshes and didn't notice a huge jump in the graphs.
Reducing the 3 daywise partitions to 3 hourly partitions for the 2 main tables refresh didn't seem to cause a big change either.
Noticed a small drop in the memory usage (about 1-2GB during hourly refresh) but didn't seem to have as large of an impact as we thought (proportional to the data reduction). This makes us think that the actual amount of data might not be the primary issue.
Screenshots
Here are some more details on the metrics used Definitions:
Turqoise Line: Hard memory limit max (same as the max cache size of our AAS tier).
Dark Blue Line: High memory limit max (approx 80% of our Hard Memory limit).
Orange Line: Memory Usage max. More details can be found in the following links: AAS Metrics, Memory Usage Forum Post
Questions
Based on our scenario (described above) what could be that cause of the memory usage spikes during refresh and how can we reduce and or manage them in a nice way (ideally removing entirely or as much as possible)? Basically always much below the turquoise and dark blue lines
We feel that if we can figure this out it may allow us to stay within our current pricing tier and also potentially allow us to bring in more data (90-120 days partitions) without the worry of hitting "Out of Memory" health alerts for our instance (which we have been receiving up until now with 60 days).
Note: Barring the current hourly refreshes we are well within the tier limits in terms of memory usage (orange line much lower than the threshold turquoise & blue). Thus solving this could free us to make better use of our AAS resources
Current Thoughts
We do have calculated columns in our data model. Could this be causing the issue?
What would be the best way to test this?
Resources
Will place any useful links to documentation in this section. Hopefully can aid in understanding the context.
Github Link for the repo containing Tabular model refresh logic we based our process off.
https://github.com/microsoft/Analysis-Services/tree/master/AsPartitionProcessing
In the README.md make sure to click the link to the white paper which provides more detail.
Please try setting MaxParallelism to some low value like 2 or 3 in the ModelConfiguration table. This will reduce the number of parallel tables and partitions it will process at once. This alone probably won’t solve the memory spike issue but it should lower the spike a little at the expense of longer refresh times. If you can deal with this tradeoff and it spikes memory less this may be a workaround.
Please set IsAvailableInMDX to false on any hidden columns or hidden measure columns which are not put on an axis or referenced directly in an MDX query. This should reduce your memory footprint during processing because it will not build attribute hierarchies for those columns. On high cardinality columns the savings could be significant.
The next thing to try would be to split the tables/partitions into separate ModelConfiguration rows in the database. Then configure it to process one ModelConfiguration then the other sequentially. The goal here would be to process some tables in one transaction and other tables in a separate transaction. That should cause the memory usage required for each transaction to be less. Of course this may impact users in that half of the data will be stale after the first transaction so you will have to judge whether this is feasible.
A more complex optimization would be to scale out AAS and have a dedicated processing node. You could then process clear the model before you full process it. That should reduce the memory requirements the most. Once processing is done you run the Synchronize command. You could even scale back in removing the processing node to save cost the rest of the hour.
Another option to consider would be to deploy the models to Power BI Premium Gen2. The very interesting nuance with Gen2 is that a P1 capacity allows each dataset to be up to 25GB unlike Gen1 and unlike AAS S1 where the total of all datasets must be less than 25GB. If your organization already owns Power BI Premium capacities this should be a good option. If not then the cost probably won’t make sense at the moment. Or you could license each user with a Power BI Premium Per User license and deploy the model to that Premium Per User capacity. If you have under 70 users this may be a more cost effective option for you to try.
I read the docs several times, and I still don't understand: how do I figure out how many computing units do I need?
I'm planning to use Spanner as the operational database for a webapp, so the storage should not be very big, definitely not in the beginning. But I do want to make sure I have enough RAM and CPU to handle a big load.
How do I know if 100 compute units are enough or I need 1000 units?
How do I know how much RAM and CPU is one compute unit?
Because instance sizing is dependent on workload and schema design, it is recommended to perform a benchmark, with some initial amount of processing units. First start out with a small amount of load and increase load until the CPU usage has reached the recommended limit [1]. Based on the operations/second observed from the benchmark supported by the amount of processing units allocated, one can estimate the processing units needed for the target load.
[1] https://cloud.google.com/spanner/docs/cpu-utilization#recommended-max
We migrated our mobile app (still being developed) from Parse to Azure. Everything is running, but the price of DocumentDB is so high that we can't continue with Azure without fix that. Probably we're doing something wrong.
1) Price seams to have a bottleneck in the DocumentDB requests.
Running a process to load the data (about 0.5 million documents), memory and CPU was ok, but the DocumentDB request limit was a bottleneck, and the price charged was very high.
2) Even after the end of this data migration (few days of processing), azure continue to charge us every day.
We can't understand what is going on here. The graphic for use are flat, but the price is still climbing, as you can see in the imagens.
Any ideas?
Thanks!
From your screenshots, you have 15 collections under the Parse database. With Parse: Aside from the system classes, each of your user-defined classes gets stored in its own collection. And given that each (non-partitioned) collection has a starting run-rate of ~$24/month (for an S1 collection), you can see where the baseline cost would be for 15 collections (around $360).
You're paying for reserved storage and RU capacity. Regardless of RU utilization, you pay whatever the cost is for that capacity (e.g. S2 runs around $50/month / collection, even if you don't execute a single query). Similar to spinning up a VM of a certain CPU capacity and then running nothing on it.
The default throughput setting for the parse collections is set to 1000 RUPS. This will cost $60 per collection (at the rate of $6 per 100 RUPS). Once you finish the parse migration, the throughput can be lowered if you believe the workload decreased. This will reduce the charge.
To learn how to do this, take a look at https://azure.microsoft.com/en-us/documentation/articles/documentdb-performance-levels/ (Changing the throughput of a Collection).
The key thing to note is that DocumentDB delivers predictable performance by reserving resources to satisfy your application's throughput needs. Because application load and access patterns change over time, DocumentDB allows you to easily increase or decrease the amount of reserved throughput available to your application.
Azure is a "pay-for-what-you-use" model, especially around resources like DocumentDB and SQL Database where you pay for the level of performance required along with required storage space. So if your requirements are that all queries/transactions have sub-second response times, you may pay more to get that performance guarantee (ignoring optimizations, etc.)
One thing I would seriously look into is the DocumentDB Cost Estimation tool; this allows you to get estimates of throughput costs based upon transaction types based on sample JSON documents you provide:
So in this example, I have an 8KB JSON document, where I expect to store 500K of them (to get an approx. storage cost) and specifying I need throughput to create 100 documents/sec, read 10/sec, and update 100/sec (I used the same document as an example of what the update will look like).
NOTE this needs to be done PER DOCUMENT -- if you're storing documents that do not necessarily conform to a given "schema" or structure in the same collection, then you'll need to repeat this process for EVERY type of document.
Based on this information, I cause use those values as inputs into the pricing calculator. This tells me that I can estimate about $450/mo for DocumentDB services alone (if this was my anticipated usage pattern).
There are additional ways you can optimize the Request Units (RUs -- metric used to measure the cost of the given request/transaction -- and what you're getting billed for): optimizing index strategies, optimizing queries, etc. Review the documentation on Request Units for more details.
Each index batch is limited from 1 to 1000 documents. When I call it from my local machine or azure VM, I got 800ms to 3000ms per 1000 doc batch. If I submit multiple batches with async, the time spent is roughly the same. That means it would take 15 - 20 hours for my ~50M document collection.
Is there a way I can make it faster?
It looks like you are using our Standard S1 search service and although there are a lot of things that can impact how fast data can be ingested. I would expect to see ingestion to a single partition search service at a rate of about 700 docs / second for an average index, so I think your numbers are not far off from what I would expect, although please note that these are purely rough estimates and you may see different results based on any number of factors (such as number of fields, quantity of facets, etc)..
It is possible that some of the extra time you are seeing is due to the latency of uploading the content from your local machine to Azure, and it would likely be faster if you did this directly from Azure but if this is just a one time-upload that probably is not worth the effort.
You can slightly increase the speed of data ingestion by increasing the number of partitions you have and the S2 Search Service will also ingest data faster. Although both of these come at a cost.
By the way, if you have 50 M documents, please make sure that you allocate enough partitions since a single S1 partition can handle 15M documents or 25GB so you will definitely need extra partitions for this service.
Also as another side note, when you are uploading your content (and especially if you choose to do parallelized uploads), keep an eye on the HTTP responses because if the search service exceeds the resources available you could get HTTP 207 (indicating one or more item failed to apply) or 503's indicating the whole batch failed due to throttling. If throttling occurs, you would want to back off a bit to let the service catch up.
I think you're reaching the request capacity:
https://azure.microsoft.com/en-us/documentation/articles/search-limits-quotas-capacity/
I would try another tier (s1, s2). If you still face the same problem, try get in touch with support team.
Another option:
Instead of pushing data, try to add your data to the blob storage, documentDb or Sql Database, and then, use the pull approach:
https://azure.microsoft.com/en-us/documentation/articles/search-howto-indexing-azure-blob-storage/
I'm running performance tests against ATS and its behaving a bit weird when using multiple virtual machines against the same table / storage account.
The entire pipeline is non blocking (await/async) and using TPL for concurrent and parallel execution.
First of all its very strange that with this setup i'm only getting about 1200 insertions. This is running on a L VM box, that is 4 cores + 800mbps.
I'm inserting 100.000 rows with unique PK and unique RK, that should leverage the ultimate distribution.
Even more deterministic behavior is the following.
When I run 1 VM i get about 1200 insertions per second.
When I run 3 VM i get about 730 on each insertions per second.
Its quite humors to read the blog post where they are specifying their targets.
https://azure.microsoft.com/en-gb/blog/windows-azures-flat-network-storage-and-2012-scalability-targets/
Single Table Partition– a table partition are all of the entities in a table with the same partition key value, and usually tables have many partitions. The throughput target for a single table partition is:
Up to 2,000 entities per second
Note, this is for a single partition, and not a single table. Therefore, a table with good partitioning, can process up to the 20,000 entities/second, which is the overall account target described above.
What shall I do to be able to utilize the 20k per second, and how would it be possible to execute more than 1,2k per VM?
--
Update:
I've now also tried using 3 storage accounts for each individual node and is still getting the performance / throttling behavior. Which i can't find a logical reason for.
--
Update 2:
I've optimized the code further and now i'm possible to execute about 1550.
--
Update 3:
I've now also tried in US West. The performance is worse there. About 33% lower.
--
Update 4:
I tried executing the code from a XL machine. Which is 8 cores instead of 4 and the double amount of memory and bandwidth and got a 2% increase in performance so clearly this problem is not on my side..
A few comments:
You mention that you are using unique PK/RK to get ultimate
distribution, but you have to keep in mind that the PK balancing is
not immediate. When you first create a table, the entire table will
be served by 1 partition server. So if you are doing inserts across
several different PKs, they will still be going to one partition
server and be bottlenecked by the scalability target for a single
partition. The partition master will only start splitting your
partitions among multiple partition servers after it has identified hot
partition servers. In your <2 minute test you will not see the
benefit of multiple partiton servers or PKs. The throughput in the
article is targeted towards a well distributed PK scheme with
frequently accessed data, causing the data to be divided amongst
multiple partition servers.
The size of your VM is not the issue as
you are not blocked on CPU, Memory, or Bandwidth. You can achieve
full storage performance from a small VM size.
Check out
http://research.microsoft.com/en-us/downloads/5c8189b9-53aa-4d6a-a086-013d927e15a7/default.aspx.
I just now did a quick test using that tool from a WebRole VM in the
same datacenter as my storage account and I acheived, from a single
instance of the tool on a single VM, ~2800 items per second upload
and ~7300 items per second download. This is using 1024 byte
entities, 10 threads, and 100 batch size. I don't know how efficient this tool is or if it disables Nagles Algorithm as I was unable to get great results (I got ~1000/second) using a batch size of 1, but at least with the 100 batch size it shows that you can achieve high items/second. This was done in US West.
Are you using Storage client library 1.7 (Microsoft.Azure.StorageClient.dll) or 2.0 (Microsoft.Azure.Storage.dll)? The 2.0 library has some performance improvements and should yield better results.
I suspect this may have to do with TCP Nagle.
See this MSDN article and this blog post.
In essence, TCP Nagle is a protocol-level optimization that batches up small requests. Since you are sending lots of small requests this is likely to negatively affect your performance.
You can disable TCP Nagle by executing this code when starting your application
ServicePointManager.UseNagleAlgorithm = false;
Are the compute instances and storage account in the same affinity group? Affinity groups ensure that network proximity between the services is optimal and should result in lower latency at the network level.
You can find affinity group configuration under the network tab.
I would tend to believe that the maximum throughput is for an optimized load. For example, I bet you that you can achieve higher performance using Batch requests than individual requests you are doing now. And of course, if you use GUIDs for your PK, you can't Batch in your current test.
So what if you changed your test to batch insert entities in groups of 100 (maximum per batch), still using GUIDs, but for which 100 entities would have the same PK?